Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format printf regression (%d in sizes of boxes and breaks) #7376

Closed
vicuna opened this issue Sep 27, 2016 · 11 comments
Closed

Format printf regression (%d in sizes of boxes and breaks) #7376

vicuna opened this issue Sep 27, 2016 · 11 comments

Comments

@vicuna
Copy link

vicuna commented Sep 27, 2016

Original bug ID: 7376
Reporter: eponier
Status: acknowledged (set by @damiendoligez on 2016-09-28T15:15:53Z)
Resolution: open
Priority: normal
Severity: minor
Platform: Linux
OS: Ubuntu
OS Version: 16.04
Version: 4.03.0
Target version: 4.07.0+dev/beta2/rc1/rc2
Category: standard library

Bug description

The arguments of Format.printf can no longer be used to specify the size of the breaks.

Format.printf "@[a@;<%d %d>b@]@." 4 2

returns

a <4 2>b (in 4.03.0 and 4.02.1)

instead of

a b (in 3.12.1)

The size of the vertical box can still be specified though.

Format.printf "@[<v %d>a@;b@]@." 4

always returns

a
b

@vicuna
Copy link
Author

vicuna commented Sep 27, 2016

Comment author: eponier

Note there should be 4 spaces between "a" and "b" in the second case of the first example.

@vicuna
Copy link
Author

vicuna commented Sep 27, 2016

Comment author: @gasche

I was not aware that <n m> was accepted as a box type specification. To my knowledge this format was only used for break hints. Is this feature documented anywhere? Format.mli merely says:

  • [@[]: open a pretty-printing box. The type and offset of the
    box may be optionally specified with the following syntax:
    the [<] character, followed by an optional box type indication,
    then an optional integer offset, and the closing [>] character.
    Box type is one of [h], [v], [hv], [b], or [hov].
    '[h]' stands for an 'horizontal' box,
    '[v]' stands for a 'vertical' box,
    '[hv]' stands for an 'horizontal-vertical' box,
    '[b]' stands for an 'horizontal-or-vertical' box demonstrating indentation,
    '[hov]' stands a simple 'horizontal-or-vertical' box.
    For instance, [@[<hov 2>] opens an 'horizontal-or-vertical'
    box with indentation 2 as obtained with [open_hovbox 2].
    For more details about boxes, see the various box opening
    functions [open_*box].

What is the semantics that you would expect of <n m> when used as a box specification?

@vicuna
Copy link
Author

vicuna commented Sep 28, 2016

Comment author: eponier

I do not understand your remark as the bug I am discussing is about break hints and not boxes. My second example illustrates that it works for boxes, and surprisingly not for break hints (while it used to work in 3.12.1).

@vicuna
Copy link
Author

vicuna commented Sep 28, 2016

Comment author: @gasche

Ah, indeed, I misread the example, sorry.

@vicuna
Copy link
Author

vicuna commented Sep 28, 2016

Comment author: @gasche

I have an idea of how this could be re-implemented in the current Format implementation. It is not terribly hard, because one can mostly follow what has already been done to support %d formatters in box types, but it still require some work and I won't be able to work on it soon. I hope that the details below would let someone else implement the feature and propose the patch for inclusion upstream.

The support for %d in box types was contributed by Benoît Vaugon (the main author of the format-GADTs code, but this feature was implemented after the bulk of the work was done) in

49d3f7b

but I hope the description below will be more helpful to understand what needs to be changed than just looking at the patch.

The format-as-GADT general idea is that literal strings that are detected to be formats at type-checking time are translated (still at type-checking time) into GADT constructors of the type _ format6. GADTs completely represent the format strings, and the functions that manipulate format strings are defined by manipulating these GADTs. There are three important sort of files in stdlib/ for this implementation:

  • CamlinternalFormatBasics contains the bare minimum, namely the definition of the format GADTs (that is already quite a lot of code) and the format concatenation operation (used in pervasives)

  • CamlinternalFormat contains the generic operations on format GADTs, for example the parsing code that turns arbitrary strings into formats (and is invoked at type-checking time by the compiler), various conversion functions (for example code transforming a format value into its format type representation, also represented as a GADT), and a generic printing functions that takes a format string, consumes as many arguments as it requires, and outputs an "accumulator" that is a data-structure representing the output after format substitution in an output-agnostic way.

  • Printf, Scanf and Format contains the format-manipulating code that is specific to their logic. Format and Printf rely on the generic printing function of CamlinternalFormat, and interprets the "accumulator" in specific ways.

Note that "formatting hints" (the @ stuff in Format) are always parsed into the structure of Format GADT values, even for format strings that are passed to Printf: we have only one GADT structure for all format consumers. When we represent those formatting hints in the GADT, we are careful to always keep the textual representation somewhere; when Printf finds a formatting hints it just treats it as a string literal, outputting its string representation.

Now that the high-level map is drawn, here is how we support @[<hov %d> in the implementation:

it calls the compute_tag function to print the accumulator into a string

https://github.com/ocaml/ocaml/blob/520fb2d/stdlib/format.ml#L1143-L1150
and the open_box_of_string function (defined in CamlinternalFormat) to do the parsing

https://github.com/ocaml/ocaml/blob/520fb2d/stdlib/camlinternalFormat.ml#L1924

In opposition, break hints are parsed in the parse_good_break function of CamlinternalFormat

https://github.com/ocaml/ocaml/blob/520fb2d/stdlib/camlinternalFormat.ml#L2620

which only expect literal numbers, not arbitrary formats, and thus creates a Formatting_lit constructor.

To implement the required feature, we would thus need to change this parse_good_break function to be closer to parse_tag, by parsing an arbitrary format and using Formatting_gen to store it. (The current parsing logic would be moved to an break_hint_of_string function.) Then we need to implement the same loginc in Format to print the sub-accumulator into a string and call break_hint_of_string on it.

I don't expect it to be a lot of work, but it could still take half-a-day.

Also, the testsuite should be improved with examples of use of this feature. Currently there is almost no coverage of complex formats in the testsuite, and any tests would be welcome. For this specific regression having a test called pr7376.ml in testsuite/tests/lib-format would probably be enough -- but having a more complete testsuite which includes this would be even better, of course.

@vicuna
Copy link
Author

vicuna commented Feb 18, 2017

Comment author: @xavierleroy

I note that this feature (%d in sizes of boxes and breaks) has never been documented since its inception in 2002, commit 9a43942. If it were me we would silently drop it.

@vicuna
Copy link
Author

vicuna commented Oct 10, 2017

Comment author: @alainfrisch

Postponing to 4.07, but Xavier simply suggested to drop the feature. Gabriel: what's your opinion?

@vicuna vicuna added the stdlib label Mar 14, 2019
@vicuna vicuna added this to the 4.07.0 milestone Mar 14, 2019
@vicuna vicuna added the bug label Mar 20, 2019
@github-actions
Copy link

github-actions bot commented May 9, 2020

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label May 9, 2020
@Drup
Copy link
Contributor

Drup commented May 9, 2020

The equivalent feature on boxes is definitely used in the wild. I would rather move we make things consistent and allow it on breaks too.

In fact, I'm surprised I didn't noticed (and fixed) that part when I was digging down there ....

@gasche gasche removed the Stale label May 9, 2020
@gasche
Copy link
Member

gasche commented May 9, 2020

If no one was motivated to do the implementation work, we can assume that no actively-maintained program is still using the feature in the wild. It sounds mildly useful but it is also a painful feature to implement and not very principled (this dynamic-size stuff relies on dynamic parsing instead of static processing of formats as we usually do). Personally I would be happy to close the issue and forget about it unless someone wants to implement it. But @Drup seems to be volunteering to implement it?

@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label Oct 11, 2021
@gasche gasche closed this as completed Oct 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants