Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[patch] Weak format checking #5949

Closed
vicuna opened this issue Mar 13, 2013 · 6 comments
Closed

[patch] Weak format checking #5949

vicuna opened this issue Mar 13, 2013 · 6 comments
Assignees
Milestone

Comments

@vicuna
Copy link

vicuna commented Mar 13, 2013

Original bug ID: 5949
Reporter: bvaugon
Assigned to: @gasche
Status: closed (set by @xavierleroy on 2016-12-07T10:34:33Z)
Resolution: fixed
Priority: normal
Severity: minor
Version: 4.00.1
Target version: 4.02.0+dev
Fixed in version: 4.02.0+dev
Category: typing
Tags: patch
Related to: #4799 #6017
Monitored by: @gasche @jmeber

Bug description

I wrote a patch to complete typing tests of formats and fix some related bugs.
It includes:

  • checking of incompatible flags.
  • checking of [width] and [precision] validity.
  • fix error messages (use %S instead of %s).

For some examples that I put in "Steps to reproduce" about buffer overflows, they only "have an invalid behavior" on a 32 bits machine. You have just to increase numbers to obtain similar problems on a 64 bits machine.

Steps to reproduce

Printf.printf "%5000000000d" 42;;
-> Segmentation fault.

Printf.printf "%3.+f" 3.14;;
-> Prints "%3.0+f".

Printf.printf "%2147483548d" 42;;
-> Does not print anything.

Printf.printf "%1.1s" "abc";;
-> Typechecks but raises an exception at run time.

Printf.printf "\n%";;
-> The typechecker prints a typing error containing a line break.

Printf.printf "%0s";;
-> Invalid format but allowed by the typechecker.

Printf.printf "%+F";;
-> Invalid format but allowed by the typechecker.

etc.

Additional information

The attached patch fixes all of these problems.

Note that this patch also resolves the bug 0004799.

File attachments

@vicuna
Copy link
Author

vicuna commented Apr 23, 2013

Comment author: @pierreweis

Thank you for your bug report.

Will check your patch and incorporate it.

@vicuna
Copy link
Author

vicuna commented Apr 24, 2013

Comment author: @pierreweis

I checked your proposed modifications, and have a few questions:

  • I agree with you that we need to prevent crazy width and precision indications.
    However, I think we should restrict these values to be lesser than Sys.max_string_length, or may be lesser, but the proper bound is not easy to find.

  • What are we attempting to do ? Do we want to ensure that all typechecked formats will be treated correctly by the various format string processing functions ? This is hopeless, since there is no way to ensure that the material written by a given format string will not overflow Sys.max_string_length; in particular due to conversions %t and %a, but even %s can output as much as Sys.max_string_length characters in the buffer (for instance calling Printf.sprintf "%s"); we also have to deal with * indications for width, partial evaluation, format string concatenation and so on.

  • The way it worked before was: do not be fussy about flags, width, and precision annotations; just treat the meaningful ones and silently drop the rest. Do we want a much more strict treatment with the risk of breaking existing code ?

  • I would like to know why you consider convertion %1.1s as invalid ?

  • Same questions for %+F, %0s, etc. Which specification do you use to check the validity ?

  • I cannot reproduce your "\n%" example.

  • The Printf module has some deficiency about padding of character strings; this should be worked out but is not very sexy to do :(

  • Anyway, we need to revise the documentation for format strings to explicitely state what is the use and validity range of the various flags and width/precision annotations for the various relevant convertions. We also have to explicitely state if non-conformity to these rules is detected or not at compile time and/or fatal or not at runtime.

@vicuna
Copy link
Author

vicuna commented Apr 25, 2013

Comment author: bvaugon

Thanks for your answer and questions!

I agree it is impossible to avoid [ Invalid_argument "String.create" ] because of lots of reasons, but my modifications about limitations of "padding" is principally to prevent "Segmentation fault". These kind of "Segmentation faults" are due to buffer overflows in the runtime when size are not representables. Since Sys.max_string_length depends on architecture, it is also impossible to statically bound paddings.

I consider "%1.1s" as invalid because, with the current runtime implementation, when this format is used with Printf.printf, it always raises [ Invalid_argument "Printf: bad conversion %s, at char number 0 in format string ``%1.1s''" ] at run time. So, ".1" with 's' does not mean anything and should be refused statically, not dynamically.

The format "%F" is used to print a float with the OCaml lexical convention. The modifier '+' does not do anything with "%F" in the current stdlib/runtime implementation. An OCaml user which writes "%+F" probably want to force the sign printing and the current implementation simply ignore the '+'.

For "%0s", it is a bit different. I think in hindsight that it could be accepted. My first idea cames from the fact that it is equivalent to "%s", but this kind of syntax ("%0s") could be usefull for OCaml code generation, so you can forget my remark.

Sorry for "\n%", the ocamlc error message is correct. I meant "\n%k", for example, which generate a bad error message.

I also agree that it should be interresting to have a complete documentation and a clean semantic of OCaml formats.

With the oral (in principle) approval of some OCaml developpers, I am currently finished writing a first (but complete) trial version of a new implementation of Printf, Scanf and Format stdlib modules where formats are not represented (as currently) by strings but are statically compiled into GADTs trees. This modifications obviously impacts the OCaml compilers (typecore.ml and predef.ml), and also Pervasives for compatibility since format types are currently defined in this module. Unlike the current implementation, my Printf, Scanf and Format modules implementations are "well typed", they does not use Obj.magic, nor dynamic format strings parsing when printing and scanning, nor this kind of horrible things. It also has significantly better performances. I try very hard to be compatible with the current implementation, except for some error messages and errors wich are now detected statically rather than dynamically.

So it is not necessarily a good idea to completely preserve the current semantic, especially for examples likes above that no reasonable programmer writes. When finished, this huge patch (maybe a few thousand lines) will probably need to be more tested, adapted and maybe partially redesigned, I suggest to have profound discussions about what shoud be kept and what should be changed from the current semantic of formats.

@vicuna
Copy link
Author

vicuna commented Apr 29, 2013

Comment author: @alainfrisch

Setting Target Version to 4.01.0 (at least for the segfault part).

@vicuna
Copy link
Author

vicuna commented Jul 22, 2013

Comment author: @alainfrisch

And bumping to 'Crash' since we still have the segfault.

@vicuna
Copy link
Author

vicuna commented Aug 21, 2014

Comment author: @gasche

Now that Benoît's GADT-based implementation is merged, none of the example he gives have the problems initially reported -- some of them eg. \n escaping issues were fairly orthogonal but were fixed in his patch as well. In particular, all known-crash related to formats are fixed, so this issue is not a blocker anymore for 4.02.

(I am not fond of having a test on the padding length at the type-checker level, which is the way Benoît fixed those issues. It catches the problem, which is good, but maybe we'll solve the issue in another way in the future.)

Note however that one of the things Benoît did not change (mostly) is the runtime semantics of formats: how atomic formats are interpreted by the printf/format/scanf functions. In many case this was implemented by calling the equivalent C formatting function, which is the cause of many format bugs we've had. Keeping this unchanged was the right choice (Benoît's work was orthogonal as it was about format type-checking), in particular the only sane choice to try to preserve compatibility, but this means that the nasty C bits remain and that we will probably still find new bugs (even probably segfaults) due to broken libc on some systems or just bad interaction on our part with the C formatters.

I'm marking this issue resolved (all the reported problems were fixed... by the reporter himself), but we'll talk formats again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants