|Anonymous | Login | Signup for a new account||2018-10-16 02:01 CEST|
|Main | My View | View Issues | Change Log | Roadmap|
|View Issue Details|
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0007376||OCaml||standard library||public||2016-09-27 19:16||2017-10-10 11:29|
|Target Version||4.07.0+dev/beta2/rc1/rc2||Fixed in Version|
|Summary||0007376: Format printf regression (%d in sizes of boxes and breaks)|
|Description||The arguments of Format.printf can no longer be used to specify the size of the breaks.|
Format.printf "@[<h>a@;<%d %d>b@]@." 4 2
a <4 2>b (in 4.03.0 and 4.02.1)
a b (in 3.12.1)
The size of the vertical box can still be specified though.
Format.printf "@[<v %d>a@;b@]@." 4
|Tags||No tags attached.|
|Note there should be 4 spaces between "a" and "b" in the second case of the first example.|
I was not aware that "<n m>" was accepted as a box type specification. To my knowledge this format was only used for break hints. Is this feature documented anywhere? Format.mli merely says:
> - [@\: open a pretty-printing box. The type and offset of the
> box may be optionally specified with the following syntax:
> the [<] character, followed by an optional box type indication,
> then an optional integer offset, and the closing [>] character.
> Box type is one of [h], [v], [hv], [b], or [hov].
> '[h]' stands for an 'horizontal' box,
> '[v]' stands for a 'vertical' box,
> '[hv]' stands for an 'horizontal-vertical' box,
> '[b]' stands for an 'horizontal-or-vertical' box demonstrating indentation,
> '[hov]' stands a simple 'horizontal-or-vertical' box.
> For instance, [@\[<hov 2>] opens an 'horizontal-or-vertical'
> box with indentation 2 as obtained with [open_hovbox 2].
> For more details about boxes, see the various box opening
> functions [open_*box].
What is the semantics that you would expect of `<n m>` when used as a box specification?
|I do not understand your remark as the bug I am discussing is about break hints and not boxes. My second example illustrates that it works for boxes, and surprisingly not for break hints (while it used to work in 3.12.1).|
|Ah, indeed, I misread the example, sorry.|
I have an idea of how this could be re-implemented in the current Format implementation. It is not terribly hard, because one can mostly follow what has already been done to support %d formatters in box types, but it still require some work and I won't be able to work on it soon. I hope that the details below would let someone else implement the feature and propose the patch for inclusion upstream.
The support for %d in box types was contributed by Benoît Vaugon (the main author of the format-GADTs code, but this feature was implemented after the bulk of the work was done) in
but I hope the description below will be more helpful to understand what needs to be changed than just looking at the patch.
The format-as-GADT general idea is that literal strings that are detected to be formats at type-checking time are translated (still at type-checking time) into GADT constructors of the type `_ format6`. GADTs completely represent the format strings, and the functions that manipulate format strings are defined by manipulating these GADTs. There are three important sort of files in stdlib/ for this implementation:
- CamlinternalFormatBasics contains the bare minimum, namely the definition of the format GADTs (that is already quite a lot of code) and the format concatenation operation (used in pervasives)
- CamlinternalFormat contains the generic operations on format GADTs, for example the parsing code that turns arbitrary strings into formats (and is invoked at type-checking time by the compiler), various conversion functions (for example code transforming a format value into its format type representation, also represented as a GADT), and a generic printing functions that takes a format string, consumes as many arguments as it requires, and outputs an "accumulator" that is a data-structure representing the output after format substitution in an output-agnostic way.
- Printf, Scanf and Format contains the format-manipulating code that is specific to their logic. Format and Printf rely on the generic printing function of CamlinternalFormat, and interprets the "accumulator" in specific ways.
Note that "formatting hints" (the @ stuff in Format) are always parsed into the structure of Format GADT values, even for format strings that are passed to Printf: we have only one GADT structure for all format consumers. When we represent those formatting hints in the GADT, we are careful to always keep the textual representation somewhere; when Printf finds a formatting hints it just treats it as a string literal, outputting its string representation.
Now that the high-level map is drawn, here is how we support @[<hov %d> in the implementation:
- the GADT definitions in CamlinternalFormatBasics distinguishes two kind of formatting hints (the "@" stuff in Format): Formatting_lit that only takes constant parameters, and Formatting_gen that may contain %d and other formats and may thus consume arguments dynamically. Their definitions are in
- the parsing code in CamlinternalFormat (parse_tag), when it sees a "<" after a "@[", tries to find a closing ">" and parse the stuff in between as a generic format (that is, "hov" is considered as a literal string, "%d" is parsed, etc.). See the implementation at
- the generic printing functions in CamlinternalFormat, in the Formatting_gen case, will recursively "print" the format argument into a new accumulator list, and then go on printing the rest of the format.
- finally, at runtime (when the user program executes) Format will will take this accumulator list (representing the content of the "<...>" hint), print it into a string (consuming %-arguments etc.), and re-parse that string to get the actual hint. The code to do that is at
it calls the `compute_tag` function to print the accumulator into a string
and the `open_box_of_string` function (defined in CamlinternalFormat) to do the parsing
In opposition, break hints are parsed in the parse_good_break function of CamlinternalFormat
which only expect literal numbers, not arbitrary formats, and thus creates a Formatting_lit constructor.
To implement the required feature, we would thus need to change this parse_good_break function to be closer to parse_tag, by parsing an arbitrary format and using Formatting_gen to store it. (The current parsing logic would be moved to an break_hint_of_string function.) Then we need to implement the same loginc in Format to print the sub-accumulator into a string and call break_hint_of_string on it.
I don't expect it to be a *lot* of work, but it could still take half-a-day.
Also, the testsuite should be improved with examples of use of this feature. Currently there is almost no coverage of complex formats in the testsuite, and any tests would be welcome. For this specific regression having a test called pr7376.ml in testsuite/tests/lib-format would probably be enough -- but having a more complete testsuite which includes this would be even better, of course.
I note that this feature (%d in sizes of boxes and breaks) has never been documented since its inception in 2002, commit 9a43942. If it were me we would silently drop it.
|Postponing to 4.07, but Xavier simply suggested to drop the feature. Gabriel: what's your opinion?|
|2016-09-27 19:16||eponier||New Issue|
|2016-09-27 19:18||eponier||Note Added: 0016340|
|2016-09-27 21:00||gasche||Note Added: 0016341|
|2016-09-27 21:00||gasche||Status||new => feedback|
|2016-09-28 11:07||eponier||Note Added: 0016345|
|2016-09-28 11:07||eponier||Status||feedback => new|
|2016-09-28 12:48||gasche||Note Added: 0016352|
|2016-09-28 17:15||doligez||Status||new => acknowledged|
|2016-09-28 17:15||doligez||Target Version||=> 4.04.0 +dev / +beta1 / +beta2|
|2016-09-28 17:44||gasche||Note Added: 0016364|
|2016-10-26 17:25||doligez||Target Version||4.04.0 +dev / +beta1 / +beta2 => 4.05.0 +dev/beta1/beta2/beta3/rc1|
|2017-02-18 16:17||xleroy||Note Added: 0017329|
|2017-02-18 16:17||xleroy||Target Version||4.05.0 +dev/beta1/beta2/beta3/rc1 => 4.06.0 +dev/beta1/beta2/rc1|
|2017-02-20 12:05||frisch||Summary||Format printf regression => Format printf regression (%d in sizes of boxes and breaks)|
|2017-02-23 16:43||doligez||Category||OCaml standard library => standard library|
|2017-10-10 11:29||frisch||Note Added: 0018528|
|2017-10-10 11:29||frisch||Target Version||4.06.0 +dev/beta1/beta2/rc1 => 4.07.0+dev/beta2/rc1/rc2|
|Copyright © 2000 - 2011 MantisBT Group|