|Anonymous | Login | Signup for a new account||2013-12-09 04:23 CET|
|Main | My View | View Issues | Change Log | Roadmap|
|View Issue Details|
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0005949||OCaml||OCaml typing||public||2013-03-13 01:32||2013-08-19 16:20|
|Target Version||4.01.1+dev||Fixed in Version|
|Summary||0005949: [patch] Weak format checking|
|Description||I wrote a patch to complete typing tests of formats and fix some related bugs.|
* checking of incompatible flags.
* checking of [width] and [precision] validity.
* fix error messages (use %S instead of %s).
For some examples that I put in "Steps to reproduce" about buffer overflows, they only "have an invalid behavior" on a 32 bits machine. You have just to increase numbers to obtain similar problems on a 64 bits machine.
|Steps To Reproduce||Printf.printf "%5000000000d" 42;;|
-> Segmentation fault.
Printf.printf "%3.+f" 3.14;;
-> Prints "%3.0+f".
Printf.printf "%2147483548d" 42;;
-> Does not print anything.
Printf.printf "%1.1s" "abc";;
-> Typechecks but raises an exception at run time.
-> The typechecker prints a typing error containing a line break.
-> Invalid format but allowed by the typechecker.
-> Invalid format but allowed by the typechecker.
|Additional Information||The attached patch fixes all of these problems.|
Note that this patch also resolves the bug 0004799.
|Tags||No tags attached.|
|Attached Files||ocaml-4.00.1-format-check.diff [^] (9,456 bytes) 2013-03-13 01:32 [Show Content]|
Thank you for your bug report.
Will check your patch and incorporate it.
I checked your proposed modifications, and have a few questions:
- I agree with you that we need to prevent crazy width and precision indications.
However, I think we should restrict these values to be lesser than Sys.max_string_length, or may be lesser, but the proper bound is not easy to find.
- What are we attempting to do ? Do we want to ensure that all typechecked formats will be treated correctly by the various format string processing functions ? This is hopeless, since there is no way to ensure that the material written by a given format string will not overflow Sys.max_string_length; in particular due to conversions %t and %a, but even %s can output as much as Sys.max_string_length characters in the buffer (for instance calling Printf.sprintf "%s"); we also have to deal with * indications for width, partial evaluation, format string concatenation and so on.
- The way it worked before was: do not be fussy about flags, width, and precision annotations; just treat the meaningful ones and silently drop the rest. Do we want a much more strict treatment with the risk of breaking existing code ?
- I would like to know why you consider convertion %1.1s as invalid ?
- Same questions for %+F, %0s, etc. Which specification do you use to check the validity ?
- I cannot reproduce your "\n%" example.
- The Printf module has some deficiency about padding of character strings; this should be worked out but is not very sexy to do :(
- Anyway, we need to revise the documentation for format strings to explicitely state what is the use and validity range of the various flags and width/precision annotations for the various relevant convertions. We also have to explicitely state if non-conformity to these rules is detected or not at compile time and/or fatal or not at runtime.
Thanks for your answer and questions!
I agree it is impossible to avoid [ Invalid_argument "String.create" ] because of lots of reasons, but my modifications about limitations of "padding" is principally to prevent "Segmentation fault". These kind of "Segmentation faults" are due to buffer overflows in the runtime when size are not representables. Since Sys.max_string_length depends on architecture, it is also impossible to statically bound paddings.
I consider "%1.1s" as invalid because, with the current runtime implementation, when this format is used with Printf.printf, it always raises [ Invalid_argument "Printf: bad conversion %s, at char number 0 in format string ``%1.1s''" ] at run time. So, ".1" with 's' does not mean anything and should be refused statically, not dynamically.
The format "%F" is used to print a float with the OCaml lexical convention. The modifier '+' does not do anything with "%F" in the current stdlib/runtime implementation. An OCaml user which writes "%+F" probably want to force the sign printing and the current implementation simply ignore the '+'.
For "%0s", it is a bit different. I think in hindsight that it could be accepted. My first idea cames from the fact that it is equivalent to "%s", but this kind of syntax ("%0s") could be usefull for OCaml code generation, so you can forget my remark.
Sorry for "\n%", the ocamlc error message is correct. I meant "\n%k", for example, which generate a bad error message.
I also agree that it should be interresting to have a complete documentation and a clean semantic of OCaml formats.
With the oral (in principle) approval of some OCaml developpers, I am currently finished writing a first (but complete) trial version of a new implementation of Printf, Scanf and Format stdlib modules where formats are not represented (as currently) by strings but are statically compiled into GADTs trees. This modifications obviously impacts the OCaml compilers (typecore.ml and predef.ml), and also Pervasives for compatibility since format types are currently defined in this module. Unlike the current implementation, my Printf, Scanf and Format modules implementations are "well typed", they does not use Obj.magic, nor dynamic format strings parsing when printing and scanning, nor this kind of horrible things. It also has significantly better performances. I try very hard to be compatible with the current implementation, except for some error messages and errors wich are now detected statically rather than dynamically.
So it is not necessarily a good idea to completely preserve the current semantic, especially for examples likes above that no reasonable programmer writes. When finished, this huge patch (maybe a few thousand lines) will probably need to be more tested, adapted and maybe partially redesigned, I suggest to have profound discussions about what shoud be kept and what should be changed from the current semantic of formats.
|Setting Target Version to 4.01.0 (at least for the segfault part).|
|And bumping to 'Crash' since we still have the segfault.|
|2013-03-13 01:32||bvaugon||New Issue|
|2013-03-13 01:32||bvaugon||File Added: ocaml-4.00.1-format-check.diff|
|2013-03-17 17:20||gasche||Status||new => acknowledged|
|2013-04-24 00:09||weis||Note Added: 0009200|
|2013-04-24 00:09||weis||Assigned To||=> weis|
|2013-04-24 00:09||weis||Status||acknowledged => assigned|
|2013-04-24 17:30||weis||Note Added: 0009203|
|2013-04-24 17:30||weis||Status||assigned => feedback|
|2013-04-25 12:28||bvaugon||Note Added: 0009204|
|2013-04-25 12:28||bvaugon||Status||feedback => assigned|
|2013-04-29 16:37||frisch||Target Version||=> 4.01.0+dev|
|2013-04-29 16:37||frisch||Note Added: 0009228|
|2013-07-22 13:16||frisch||Note Added: 0009825|
|2013-07-22 13:16||frisch||Severity||minor => crash|
|2013-08-19 16:08||doligez||Relationship added||related to 0004799|
|2013-08-19 16:20||doligez||Target Version||4.01.0+dev => 4.01.1+dev|
|Copyright © 2000 - 2011 MantisBT Group|