Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom format #6452

Closed
vicuna opened this issue Jun 5, 2014 · 19 comments
Closed

custom format #6452

vicuna opened this issue Jun 5, 2014 · 19 comments

Comments

@vicuna
Copy link

vicuna commented Jun 5, 2014

Original bug ID: 6452
Reporter: @diml
Assigned to: @diml
Status: assigned (set by @diml on 2016-04-19T12:52:03Z)
Resolution: reopened
Priority: normal
Severity: feature
Target version: undecided
Category: standard library
Monitored by: @Drup bvaugon @hcarty @yakobowski

Bug description

It would be good if we could put custom functions into a format. There doesn't need to be a syntax for it, this could be the job of a ppx.

I'm thinking of something like this:

| Custom :
('b -> 'x -> 'c) * ('a, 'b, 'c, 'd, 'e, 'f) fmt ->
('x -> 'a, 'b, 'c, 'd, 'e, 'f) fmt

This would allow to write things like github.com/janestreet/custom_printf in a much cleaner way. At the moment we modify the call site to stuff the conversion functions in the list of arguments: it's not pretty and becoming more and more complex. With the suggested modification we could simply 'map' the format and stuff the conversion functions in it.

I don't have a use case but it might also be interesting to allow custom functions to take 0 or more than 1 argument, here is a more generic proposal:

| Custom :
('x, 'y, 'a, 'c) custom_arity * ('b -> 'y) * ('a, 'b, 'c, 'd, 'e, 'f) fmt ->
('x, 'b, 'c, 'd, 'e, 'f) fmt

type (_, _, _, _) custom_arity =
| Zero : ('a, 'b, 'a, 'b) custom
| Succ : ('a, 'b, 'c, 'd) custom -> ('x -> 'a, 'x -> 'b, 'c, 'd) custom

File attachments

@vicuna
Copy link
Author

vicuna commented Mar 8, 2015

Comment author: @gasche

The patch proposed in
#140
was just merged in trunk. Thanks.

@vicuna
Copy link
Author

vicuna commented Mar 10, 2015

Comment author: @diml

Added a patch to make custom functions always return a string. To avoid pinning the format to one specific class of printf functions.

@vicuna
Copy link
Author

vicuna commented May 18, 2015

Comment author: @diml

Merged in 4.02 (revisions 16124 and 16125).

@vicuna
Copy link
Author

vicuna commented May 18, 2015

Comment author: @gasche

Thanks for taking care of the 4.02 merge.

@vicuna
Copy link
Author

vicuna commented May 22, 2015

Comment author: bvaugon

I submitted a new version that complete the custom format
implementation of Jeremie by adding support for formatting and
scanning usages.

In this version, instead of computing a string, a custom node takes a
printer that writes data in a Format.formatter. It is usefull for
streaming data rather than constructing a string in-memory, and for
generating formatting annotations when targeting Format.xxx functions.

In addition, I suggest that a custom node now takes a fourth argument,
a scanner, that is used by Scanf.xxx functions to scan arbitrary data
and give them to the callback function.

The printers and scanners may be arbitrary functions that manually
manipulate the in/out data-flows. However, for simple use-cases, a
custom node may be simply constructed as follows, by forwarding
printing to Format.fprintf and scanning to Scanf.bscanf:

{[
let arity = Custom_succ (Custom_succ Custom_zero)

let printer mode fmt n b = match mode with
| Called_from_Printf -> Format.fprintf fmt "[%X/%B]" n b
| Called_from_Format -> Format.fprintf fmt "@[[%X/@,%B]@]" n b

let scanner ib f = Scanf.bscanf ib "[%X/%B]" f

let fmt = Custom (arity, printer, scanner, End_of_format)
]}

The first argument, "arity", is an integer in a à la Peano
representation from range [ 0; +inf [ (or from range [ 0; +inf ] using
-rectypes (ok, forget it ^^)). It defines the number of values
expected by the printer and the number of values scanned by the
scanner and given to the callback function.

The second argument is the printer, it is called with a mode, a
Format.formatter and [arity] arguments; only when all arguments have
been passed to the printing function. It is then robust to partial
calls of printing functions. The [mode] is usefull to provide printers
that change their behaviour depending on the calling context. Such a
printer should provide formatting annotations when it is called by a
Format.[x]printf function, and not when it is called by a
Printf.[x]printf function.

The scanning function takes as arguments a Scanf.Scanning.in_channel
and a callback function [f]. After scanning (or computing) the [arity]
values from the in_channel, the scanner have to call [f] with the
[arity] values as argument and to return the value returned by [f].

The last argument is the rest of the format.

As the Jeremie one, this implementation is of course type safe (no
magic).

This patch however has some drawbacks :

  • it is much bigger, in particular because I needed to move
    some type declarations and printing primitives from Format and Scanf
    modules to CamlinternalFormat and CamlinternalFormatBasics modules.
  • for basic custom printers, writting data through a formatter is a bit
    slower than constructing a string and directly pushing it
    in the Buffer.t or the output channel as Jeremie does.

@vicuna
Copy link
Author

vicuna commented May 22, 2015

Comment author: @gasche

(As we already discussed together,) I am unconvinced by the idea of adding a Format dependency to the format-string logic and implementation. I think that Printf and Scanf are reasonably general in their interface, but there are many other pretty-printing libraries than Format in the OCaml world that make different assumptions.

(Of course the dependency is already there to some degree, in the format of the dedicated formatting converters in format strings (eg. "[<h 3>...]"), but I think a hard dependency on the internal type of Format buffers is qualitatively different.)

(One possibility would be to remove the Format support of this patch, but keep a mode argument with only one constructor calling_from_Printf, keeping the door open for other modes in the future.)

Jérémie, would you be interested in either the Format or the Scanf support?

I'm reluctant to consider a change as invasive as the proposed patch just weeks before a minor release. Is there a strong consensus among the people that care about format strings?

@vicuna
Copy link
Author

vicuna commented May 22, 2015

Comment author: bvaugon

Yes, but remarks that having the support of Format features in Custom
nodes does not impose to use them and "leave the door opened" to use
any extern custom formatting functions. It is just a functionnality,
that allow programmers to use uniformly the different printing and
scanning functions offered by the OCaml stdlib.

Moreover, there is no new dependencies to the Format module nor
between stdlib modules. Furthermore, before this patch, there were
already dependencies from CamlinternalFormatBasics to format-specific
constructions before this patch (I think about the types "block_type",
"formatting_lit", and "formatting_gen"). Instead, this patch gathers
type declarations that were previously spread in multiple modules.

But you are right, this patch is huge (not really because it consits
in deep modifications, but because it moves large units of code and
type declarations), and it might be normal to take more time to
deliberate about it if the OCaml release process impose it.

@vicuna
Copy link
Author

vicuna commented May 24, 2015

Comment author: @gasche

I think that there is something fundamentally problematic with importing a complete pretty-printing library logic into a module that currently only handle formatting strings.

There are some weird things with the patch as proposed, for example:

  • the From_file constructor for Scanf scanning buffers apparently changed his signature (why?)
  • some code that used a Buffer before now uses a Bytes with home-duplicated regrowing logic
  • a lot of code moves from buffer_foo to Buffer.foo; this makes the patch harder to review for no good reason, I think (but it would be fine as a separate patch, along with some justification of why this part of the code may or may not depend on Buffer)

@vicuna
Copy link
Author

vicuna commented May 24, 2015

Comment author: bvaugon

The reason is that the Scanning.in_channel has been moved in
CamlinternalFormatBasics, which is loaded before Pervasives, it can
now no longer use the OCaml stdlib.

The old "in_channel_name" (declaring the constructor From_file) made
multiple references to the type Pervasives.in_channel (that was also
not very logic) which were only used to close the channel. So, I
transformed it in a proper generic function "close" stored in a field
of the Scanning.in_channel record. This modification removed the same
time the dependance to Pervasives.in_channel.

A Buffer.t was also mentionned in the type Scanning.in_channel and for
the same reason, I suppressed the reference to Buffer.t and changed it
to a two fields (bytes, int). Independently, when I changed this
reference to Buffer.t, I saw a partial implementation of buffers in
the CamlinternalFormat which had nothing to do there because the
module CamlinternalFormat, contrary to CamlinternalFormatBasics,
already depends on the module Buffer. That's why I cleaned this code
by removing the partial implementation of buffers from
CamlinternalFormat and replace it with the standard Buffer module.

Since these modifications only concern internals of the stdlib (I
meen, is hidden behind abstract types or masked inside camlinternalXXX
modules), it does'nt break compatibility.

@vicuna
Copy link
Author

vicuna commented May 25, 2015

Comment author: @diml

Jérémie, would you be interested in either the Format or the Scanf support?

If Format was supported, for the !"%{sexp:type}" syntax custom_printf would use [Sexp.pp] when called from Format. Same for Scanf, !"{sexp:type}" could allow to parse a S-expression.

But I don't think there is a high demand for it, at Jane Street for instance we use custom_printf a lot more than Format or Scanf.

@vicuna
Copy link
Author

vicuna commented May 25, 2015

Comment author: @gasche

I'm thinking of not trying to rush this patch into 4.02, but eventually merge the feature in 4.03. That would mean that the printf-relevant interface would be different between the two major versions. Is that ok for you?

@vicuna
Copy link
Author

vicuna commented May 25, 2015

Comment author: @diml

That's OK for me.

@vicuna
Copy link
Author

vicuna commented Jun 30, 2015

Comment author: @gasche

Just a note to whoever may wonder what happen{s,ed} to this patch. I am still interesting in merging improved Custom_format in trunk, but I am still a bit worried by the bulkiness of Benoît's patch and in particular the import of a large quantity of pretty-printing logic deeper in the OCaml distribution's entrails. I would be interested in knowing what the Scanf-relevant part of this patch looks like, but I personally lack the time to play with the current patch in this way.

If someone wished to move this forward, I think a second opinion on "can this proposal be made less invasive?", or maybe even "is there a better compromise in term of expressiveness / invasiveness ratio?" would be helpful.

@vicuna
Copy link
Author

vicuna commented Oct 6, 2015

Comment author: @Drup

I don't have much opinion on the presumed "bulkiness" of the patch (except that moving code around with barely changing it, in a backward compatible manner, is not my definition of "invasive"), however I do have an opinion on the feature.

As I mentioned in ppx_custom_printf's bug tracker1, composing with to_string function is wrong, and not in line with the general spirit of composable pretty printers that is presented in the Format module.

Maybe janestreet doesn't care about proper indentation of pretty printed output, but I do, and as much as I'm interested by ppx_custom_printf, It's unusable for me at the moment due to this limitation.

@vicuna
Copy link
Author

vicuna commented Oct 7, 2015

Comment author: @diml

Note: since the patch moves a lot of code around, it'd be nice to provide diffing commands, such as:

patdiff <(git show :stdlib/format.ml | sed -n 12,42p) stdlib/camlinternalFormat.mli

It will make it easier for people to review the changes (once the patch has been rebased to the current trunk and submitted as a pull request on github).

@vicuna
Copy link
Author

vicuna commented Apr 18, 2016

Comment author: @gasche

I'm moving this to 4.04; I think we are too late in the 4.03 cycle to make a decision on the different implementation.

This means that, if we integrate the new interface in a future release, existing code using Custom_format will have to be changed. As long as everyone understands that the internal format definitions are still subject to future change, this is fine.

I would be happy if someone else wished to take decisions on this work. (I was not assigned on the issue, but consider that I am explicit un-assigning myself anyway.)

@vicuna
Copy link
Author

vicuna commented Apr 19, 2016

Comment author: @diml

OK, I assigned the issue to myself. I won't have time to look at this in the near future, but I'll try to come back to this later

@vicuna vicuna added the stdlib label Mar 14, 2019
@vicuna vicuna assigned ghost Mar 14, 2019
@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label May 13, 2020
@nojb
Copy link
Contributor

nojb commented May 13, 2020

This was added in #140

@nojb nojb closed this as completed May 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants