Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007376OCamlstandard librarypublic2016-09-27 19:162017-10-10 11:29
Reportereponier 
Assigned To 
PrioritynormalSeverityminorReproducibilityalways
StatusacknowledgedResolutionopen 
PlatformLinuxOSUbuntuOS Version16.04
Product Version4.03.0 
Target Version4.07.0+devFixed in Version 
Summary0007376: Format printf regression (%d in sizes of boxes and breaks)
DescriptionThe arguments of Format.printf can no longer be used to specify the size of the breaks.

Format.printf "@[<h>a@;<%d %d>b@]@." 4 2

returns

a <4 2>b (in 4.03.0 and 4.02.1)

instead of

a b (in 3.12.1)

The size of the vertical box can still be specified though.

Format.printf "@[<v %d>a@;b@]@." 4

always returns

a
    b
TagsNo tags attached.
Attached Files

- Relationships

-  Notes
(0016340)
eponier (reporter)
2016-09-27 19:18

Note there should be 4 spaces between "a" and "b" in the second case of the first example.
(0016341)
gasche (developer)
2016-09-27 21:00

I was not aware that "<n m>" was accepted as a box type specification. To my knowledge this format was only used for break hints. Is this feature documented anywhere? Format.mli merely says:

> - [@\[]: open a pretty-printing box. The type and offset of the
> box may be optionally specified with the following syntax:
> the [<] character, followed by an optional box type indication,
> then an optional integer offset, and the closing [>] character.
> Box type is one of [h], [v], [hv], [b], or [hov].
> '[h]' stands for an 'horizontal' box,
> '[v]' stands for a 'vertical' box,
> '[hv]' stands for an 'horizontal-vertical' box,
> '[b]' stands for an 'horizontal-or-vertical' box demonstrating indentation,
> '[hov]' stands a simple 'horizontal-or-vertical' box.
> For instance, [@\[<hov 2>] opens an 'horizontal-or-vertical'
> box with indentation 2 as obtained with [open_hovbox 2].
> For more details about boxes, see the various box opening
> functions [open_*box].

What is the semantics that you would expect of `<n m>` when used as a box specification?
(0016345)
eponier (reporter)
2016-09-28 11:07

I do not understand your remark as the bug I am discussing is about break hints and not boxes. My second example illustrates that it works for boxes, and surprisingly not for break hints (while it used to work in 3.12.1).
(0016352)
gasche (developer)
2016-09-28 12:48

Ah, indeed, I misread the example, sorry.
(0016364)
gasche (developer)
2016-09-28 17:44

I have an idea of how this could be re-implemented in the current Format implementation. It is not terribly hard, because one can mostly follow what has already been done to support %d formatters in box types, but it still require some work and I won't be able to work on it soon. I hope that the details below would let someone else implement the feature and propose the patch for inclusion upstream.

The support for %d in box types was contributed by BenoƮt Vaugon (the main author of the format-GADTs code, but this feature was implemented after the bulk of the work was done) in

  https://github.com/ocaml/ocaml/commit/49d3f7b9f89826ed1b2d33a144277b390bbc3f2e [^]

but I hope the description below will be more helpful to understand what needs to be changed than just looking at the patch.

The format-as-GADT general idea is that literal strings that are detected to be formats at type-checking time are translated (still at type-checking time) into GADT constructors of the type `_ format6`. GADTs completely represent the format strings, and the functions that manipulate format strings are defined by manipulating these GADTs. There are three important sort of files in stdlib/ for this implementation:

- CamlinternalFormatBasics contains the bare minimum, namely the definition of the format GADTs (that is already quite a lot of code) and the format concatenation operation (used in pervasives)

- CamlinternalFormat contains the generic operations on format GADTs, for example the parsing code that turns arbitrary strings into formats (and is invoked at type-checking time by the compiler), various conversion functions (for example code transforming a format value into its format type representation, also represented as a GADT), and a generic printing functions that takes a format string, consumes as many arguments as it requires, and outputs an "accumulator" that is a data-structure representing the output after format substitution in an output-agnostic way.

- Printf, Scanf and Format contains the format-manipulating code that is specific to their logic. Format and Printf rely on the generic printing function of CamlinternalFormat, and interprets the "accumulator" in specific ways.

Note that "formatting hints" (the @ stuff in Format) are always parsed into the structure of Format GADT values, even for format strings that are passed to Printf: we have only one GADT structure for all format consumers. When we represent those formatting hints in the GADT, we are careful to always keep the textual representation somewhere; when Printf finds a formatting hints it just treats it as a string literal, outputting its string representation.

Now that the high-level map is drawn, here is how we support @[<hov %d> in the implementation:

- the GADT definitions in CamlinternalFormatBasics distinguishes two kind of formatting hints (the "@" stuff in Format): Formatting_lit that only takes constant parameters, and Formatting_gen that may contain %d and other formats and may thus consume arguments dynamically. Their definitions are in
  https://github.com/ocaml/ocaml/blob/520fb2d/stdlib/camlinternalFormatBasics.ml#L425-L431 [^]

- the parsing code in CamlinternalFormat (parse_tag), when it sees a "<" after a "@[", tries to find a closing ">" and parse the stuff in between as a generic format (that is, "hov" is considered as a literal string, "%d" is parsed, etc.). See the implementation at

  https://github.com/ocaml/ocaml/blob/520fb2d/stdlib/camlinternalFormat.ml#L2600-L2610 [^]

- the generic printing functions in CamlinternalFormat, in the Formatting_gen case, will recursively "print" the format argument into a new accumulator list, and then go on printing the rest of the format.

  https://github.com/ocaml/ocaml/blob/520fb2d/stdlib/camlinternalFormat.ml#L1561-L1564 [^]

- finally, at runtime (when the user program executes) Format will will take this accumulator list (representing the content of the "<...>" hint), print it into a string (consuming %-arguments etc.), and re-parse that string to get the actual hint. The code to do that is at

  https://github.com/ocaml/ocaml/blob/520fb2d/stdlib/format.ml#L1194-L1197 [^]

it calls the `compute_tag` function to print the accumulator into a string

  https://github.com/ocaml/ocaml/blob/520fb2d/stdlib/format.ml#L1143-L1150 [^]
and the `open_box_of_string` function (defined in CamlinternalFormat) to do the parsing

  https://github.com/ocaml/ocaml/blob/520fb2d/stdlib/camlinternalFormat.ml#L1924 [^]

In opposition, break hints are parsed in the parse_good_break function of CamlinternalFormat

  https://github.com/ocaml/ocaml/blob/520fb2d/stdlib/camlinternalFormat.ml#L2620 [^]

which only expect literal numbers, not arbitrary formats, and thus creates a Formatting_lit constructor.

To implement the required feature, we would thus need to change this parse_good_break function to be closer to parse_tag, by parsing an arbitrary format and using Formatting_gen to store it. (The current parsing logic would be moved to an break_hint_of_string function.) Then we need to implement the same loginc in Format to print the sub-accumulator into a string and call break_hint_of_string on it.

I don't expect it to be a *lot* of work, but it could still take half-a-day.

Also, the testsuite should be improved with examples of use of this feature. Currently there is almost no coverage of complex formats in the testsuite, and any tests would be welcome. For this specific regression having a test called pr7376.ml in testsuite/tests/lib-format would probably be enough -- but having a more complete testsuite which includes this would be even better, of course.
(0017329)
xleroy (administrator)
2017-02-18 16:17

I note that this feature (%d in sizes of boxes and breaks) has never been documented since its inception in 2002, commit 9a43942. If it were me we would silently drop it.
(0018528)
frisch (developer)
2017-10-10 11:29

Postponing to 4.07, but Xavier simply suggested to drop the feature. Gabriel: what's your opinion?

- Issue History
Date Modified Username Field Change
2016-09-27 19:16 eponier New Issue
2016-09-27 19:18 eponier Note Added: 0016340
2016-09-27 21:00 gasche Note Added: 0016341
2016-09-27 21:00 gasche Status new => feedback
2016-09-28 11:07 eponier Note Added: 0016345
2016-09-28 11:07 eponier Status feedback => new
2016-09-28 12:48 gasche Note Added: 0016352
2016-09-28 17:15 doligez Status new => acknowledged
2016-09-28 17:15 doligez Target Version => 4.04.0 +dev / +beta1 / +beta2
2016-09-28 17:44 gasche Note Added: 0016364
2016-10-26 17:25 doligez Target Version 4.04.0 +dev / +beta1 / +beta2 => 4.05.0 +dev/beta1/beta2/beta3/rc1
2017-02-18 16:17 xleroy Note Added: 0017329
2017-02-18 16:17 xleroy Target Version 4.05.0 +dev/beta1/beta2/beta3/rc1 => 4.06.0 +dev/beta1/beta2/rc1
2017-02-20 12:05 frisch Summary Format printf regression => Format printf regression (%d in sizes of boxes and breaks)
2017-02-23 16:43 doligez Category OCaml standard library => standard library
2017-10-10 11:29 frisch Note Added: 0018528
2017-10-10 11:29 frisch Target Version 4.06.0 +dev/beta1/beta2/rc1 => 4.07.0+dev


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker