Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005949OCamltypingpublic2013-03-13 01:322016-12-07 11:34
Assigned Togasche 
PlatformOSOS Version
Product Version4.00.1 
Target Version4.02.0+devFixed in Version4.02.0+dev 
Summary0005949: [patch] Weak format checking
DescriptionI wrote a patch to complete typing tests of formats and fix some related bugs.
It includes:
 * checking of incompatible flags.
 * checking of [width] and [precision] validity.
 * fix error messages (use %S instead of %s).

For some examples that I put in "Steps to reproduce" about buffer overflows, they only "have an invalid behavior" on a 32 bits machine. You have just to increase numbers to obtain similar problems on a 64 bits machine.
Steps To ReproducePrintf.printf "%5000000000d" 42;;
-> Segmentation fault.

Printf.printf "%3.+f" 3.14;;
-> Prints "%3.0+f".

Printf.printf "%2147483548d" 42;;
-> Does not print anything.

Printf.printf "%1.1s" "abc";;
-> Typechecks but raises an exception at run time.

Printf.printf "\n%";;
-> The typechecker prints a typing error containing a line break.

Printf.printf "%0s";;
-> Invalid format but allowed by the typechecker.

Printf.printf "%+F";;
-> Invalid format but allowed by the typechecker.

Additional InformationThe attached patch fixes all of these problems.

Note that this patch also resolves the bug 0004799.
Attached Filesdiff file icon ocaml-4.00.1-format-check.diff [^] (9,456 bytes) 2013-03-13 01:32 [Show Content]

- Relationships
related to 0004799closed Printf ignores width on %b %c %a %% (and maybe other) 
related to 0006017closedgasche A new format implementation based on GADTs 

-  Notes
weis (developer)
2013-04-24 00:09

Thank you for your bug report.

Will check your patch and incorporate it.
weis (developer)
2013-04-24 17:30

I checked your proposed modifications, and have a few questions:

- I agree with you that we need to prevent crazy width and precision indications.
  However, I think we should restrict these values to be lesser than Sys.max_string_length, or may be lesser, but the proper bound is not easy to find.

- What are we attempting to do ? Do we want to ensure that all typechecked formats will be treated correctly by the various format string processing functions ? This is hopeless, since there is no way to ensure that the material written by a given format string will not overflow Sys.max_string_length; in particular due to conversions %t and %a, but even %s can output as much as Sys.max_string_length characters in the buffer (for instance calling Printf.sprintf "%s"); we also have to deal with * indications for width, partial evaluation, format string concatenation and so on.

- The way it worked before was: do not be fussy about flags, width, and precision annotations; just treat the meaningful ones and silently drop the rest. Do we want a much more strict treatment with the risk of breaking existing code ?

- I would like to know why you consider convertion %1.1s as invalid ?

- Same questions for %+F, %0s, etc. Which specification do you use to check the validity ?

- I cannot reproduce your "\n%" example.

- The Printf module has some deficiency about padding of character strings; this should be worked out but is not very sexy to do :(

- Anyway, we need to revise the documentation for format strings to explicitely state what is the use and validity range of the various flags and width/precision annotations for the various relevant convertions. We also have to explicitely state if non-conformity to these rules is detected or not at compile time and/or fatal or not at runtime.
bvaugon (developer)
2013-04-25 12:28

Thanks for your answer and questions!

I agree it is impossible to avoid [ Invalid_argument "String.create" ] because of lots of reasons, but my modifications about limitations of "padding" is principally to prevent "Segmentation fault". These kind of "Segmentation faults" are due to buffer overflows in the runtime when size are not representables. Since Sys.max_string_length depends on architecture, it is also impossible to statically bound paddings.

I consider "%1.1s" as invalid because, with the current runtime implementation, when this format is used with Printf.printf, it always raises [ Invalid_argument "Printf: bad conversion %s, at char number 0 in format string ``%1.1s''" ] at run time. So, ".1" with 's' does not mean anything and should be refused statically, not dynamically.

The format "%F" is used to print a float with the OCaml lexical convention. The modifier '+' does not do anything with "%F" in the current stdlib/runtime implementation. An OCaml user which writes "%+F" probably want to force the sign printing and the current implementation simply ignore the '+'.

For "%0s", it is a bit different. I think in hindsight that it could be accepted. My first idea cames from the fact that it is equivalent to "%s", but this kind of syntax ("%0s") could be usefull for OCaml code generation, so you can forget my remark.

Sorry for "\n%", the ocamlc error message is correct. I meant "\n%k", for example, which generate a bad error message.

I also agree that it should be interresting to have a complete documentation and a clean semantic of OCaml formats.

With the oral (in principle) approval of some OCaml developpers, I am currently finished writing a first (but complete) trial version of a new implementation of Printf, Scanf and Format stdlib modules where formats are not represented (as currently) by strings but are statically compiled into GADTs trees. This modifications obviously impacts the OCaml compilers ( and, and also Pervasives for compatibility since format types are currently defined in this module. Unlike the current implementation, my Printf, Scanf and Format modules implementations are "well typed", they does not use Obj.magic, nor dynamic format strings parsing when printing and scanning, nor this kind of horrible things. It also has significantly better performances. I try very hard to be compatible with the current implementation, except for some error messages and errors wich are now detected statically rather than dynamically.

So it is not necessarily a good idea to completely preserve the current semantic, especially for examples likes above that no reasonable programmer writes. When finished, this huge patch (maybe a few thousand lines) will probably need to be more tested, adapted and maybe partially redesigned, I suggest to have profound discussions about what shoud be kept and what should be changed from the current semantic of formats.
frisch (developer)
2013-04-29 16:37

Setting Target Version to 4.01.0 (at least for the segfault part).
frisch (developer)
2013-07-22 13:16

And bumping to 'Crash' since we still have the segfault.
gasche (administrator)
2014-08-21 08:35

Now that Benoît's GADT-based implementation is merged, none of the example he gives have the problems initially reported -- some of them eg. \n escaping issues were fairly orthogonal but were fixed in his patch as well. In particular, all known-crash related to formats are fixed, so this issue is not a blocker anymore for 4.02.

(I am not fond of having a test on the padding length at the type-checker level, which is the way Benoît fixed those issues. It catches the problem, which is good, but maybe we'll solve the issue in another way in the future.)

Note however that one of the things Benoît *did not* change (mostly) is the runtime semantics of formats: how atomic formats are interpreted by the printf/format/scanf functions. In many case this was implemented by calling the equivalent C formatting function, which is the cause of many format bugs we've had. Keeping this unchanged was the right choice (Benoît's work was orthogonal as it was about format type-checking), in particular the only sane choice to try to preserve compatibility, but this means that the nasty C bits remain and that we will probably still find new bugs (even probably segfaults) due to broken libc on some systems or just bad interaction on our part with the C formatters.

I'm marking this issue resolved (all the reported problems were fixed... by the reporter himself), but we'll talk formats again.

- Issue History
Date Modified Username Field Change
2013-03-13 01:32 bvaugon New Issue
2013-03-13 01:32 bvaugon File Added: ocaml-4.00.1-format-check.diff
2013-03-17 17:20 gasche Status new => acknowledged
2013-04-24 00:09 weis Note Added: 0009200
2013-04-24 00:09 weis Assigned To => weis
2013-04-24 00:09 weis Status acknowledged => assigned
2013-04-24 17:30 weis Note Added: 0009203
2013-04-24 17:30 weis Status assigned => feedback
2013-04-25 12:28 bvaugon Note Added: 0009204
2013-04-25 12:28 bvaugon Status feedback => assigned
2013-04-29 16:37 frisch Target Version => 4.01.0+dev
2013-04-29 16:37 frisch Note Added: 0009228
2013-07-22 13:16 frisch Note Added: 0009825
2013-07-22 13:16 frisch Severity minor => crash
2013-08-19 16:08 doligez Relationship added related to 0004799
2013-08-19 16:20 doligez Target Version 4.01.0+dev => 4.01.1+dev
2013-12-16 14:34 doligez Tag Attached: patch
2014-04-22 11:50 frisch Relationship added related to 0006017
2014-05-25 20:20 doligez Target Version 4.01.1+dev => 4.02.0+dev
2014-07-17 10:21 doligez Assigned To weis => gasche
2014-07-17 10:21 doligez Severity crash => minor
2014-08-21 08:35 gasche Note Added: 0012035
2014-08-21 08:35 gasche Status assigned => resolved
2014-08-21 08:35 gasche Fixed in Version => 4.02.0+dev
2014-08-21 08:35 gasche Resolution open => fixed
2016-12-07 11:34 xleroy Status resolved => closed
2017-02-23 16:45 doligez Category OCaml typing => typing

Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker