Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format: %(...%)-heavy code no longer compiles with trunk #6421

Closed
vicuna opened this issue May 13, 2014 · 27 comments
Closed

Format: %(...%)-heavy code no longer compiles with trunk #6421

vicuna opened this issue May 13, 2014 · 27 comments
Assignees

Comments

@vicuna
Copy link

vicuna commented May 13, 2014

Original bug ID: 6421
Reporter: @yakobowski
Assigned to: @gasche
Status: closed (set by @xavierleroy on 2015-12-11T18:27:37Z)
Resolution: fixed
Priority: normal
Severity: major
Fixed in version: 4.02.0+beta1 / +rc1
Category: typing

Bug description

(This code is directly extracted from Frama-C)

The code in the attached filed can no longer be compiled using trunk, probably because of the new GADT-based representation for format strings. The trouble arises in functions that take formats as arguments, before substituting them using %(%). Unfortunately, we make an heavy use of those.

File attachments

@vicuna
Copy link
Author

vicuna commented May 13, 2014

Comment author: @gasche

The typing issue with %(..%) is known. The plan is to fix it soon-ish: if we can't fix the gadt-based version, I will have to revert it (long enough) before any release candidate.

I planned to write some documentation to let people know, but I didn't expect to be flooded with reports just the day after the version branching!

@vicuna
Copy link
Author

vicuna commented May 14, 2014

Comment author: bvaugon

This incompatibility problem is due to the (involuntary) usage of
a bug from the legacy version of formats. In fact, your attached file
pretty_utils.ml is incorrect and may be used to produce a segmentation
fault. For exemple, if you write a simple other file main.ml
containing the following two lines:

let pp_int oc n = Format.fprintf oc "%d" n in
Pretty_utils.pp_list ~pre:"%s" pp_int Format.std_formatter [ 1; 2; 3 ]

and compile it separately with your file pretty_utils.ml using the
legacy version of the compiler:

$ ocamlc -c pretty_utils.ml
$ ocamlc -c -rectypes main.ml # warning: -rectypes
$ ocamlc pretty_utils.cmo main.cmo -o p

You obtain a sympathetic behavior:
$ ./p
Segmentation fault

This bug have been fixed in the new version, that's why
pretty_utils.ml is now rejected with a typing error (that is
unreadable, I confess). In fact, to prevent this kind of error, the
new inferred type for the %(%) construction is much more complex than
the old wrong-one, and is in particular impacted by format
constructions that follows it. It is then (unfortunately) sometimes
hard to consider error messages on formats using "%(%)", and to fix
codes using weaknesses of the legacy format implementation.

I attach a fixed version of pretty_utils.ml, that compiles with the
old and new versions of formats. Of course, it prevents (at typing
time) invalid usages like above, but only with the new implementation
of formats since the old-one is broken. It consists in deletions of
usage of %(%), that is regrettable for lisibility but necessary for
validity.

@vicuna
Copy link
Author

vicuna commented May 14, 2014

Comment author: @gasche

I don't completely agree with Benoît's interpretation above.

The following things are clear:

  1. There was a soundness bug in the legacy implementation of format, which is due to the author of the code making the same assumption that is broken in unsound interaction of -rectypes and GADTs #6405
  2. The trunk implementation (Benoît's) is free of such soundness bugs (it doesn't use Obj.magic), so we know for sure that anything it accepts is correct
  3. The current implementation of trunk has a weaker typing of %(...%) that makes them unusable in a lot of cases
  4. We are not yet sure that it is possible to tweak the trunk implementation to give %(..%) a richer type that would accept Boris' code. Doing this change is what I called "fix [the typing issue]" in my message above. While minor in comparison to the overall surface of a format implementation, it is a difficult change to make.

We know for sure that the version of the code proposed by Benoît above is "correct" (in the sense that it can't be used to break soundness), as it is accepted by the current implementation. We disagree on whether Boris' code itself is correct: my opinion (and the one of the code's author) is that it is correct (it can be accepted as expected by a sound implementation). Generalizing the typing of %(..%) would let us be sure of that.

@vicuna
Copy link
Author

vicuna commented May 14, 2014

Comment author: bobot

I wonder what is the semantic and the intent with "...%(%)...". I always though that only 'constant' format6 can be used. So %(%) wait the same type than "".

@vicuna
Copy link
Author

vicuna commented May 14, 2014

Comment author: @gasche

Indeed: the point is that you can pass formatting annotations (@. etc.), so that is not at all equivalent to requiring %s. It's reasonable to pass, for example, a pretty-printing line break as the separator of a list to pretty-print.

@vicuna
Copy link
Author

vicuna commented May 14, 2014

Comment author: @mauny

@gasche> We are not yet sure that it is possible to tweak the trunk implementation to give %(..%) a richer type that would accept Boris' code.

Well, the fact is that Boris' code is buggy (unsound), as shown by
@bvaugon in his message: calling Boris' function segfaults. It's hard to
understand why you want this code to typecheck!

@vicuna
Copy link
Author

vicuna commented May 14, 2014

Comment author: @yakobowski

I'm still a bit lost on this issue. For me, our code is not buggy (of course :-) ). The real bug appears in Benoît's example:

let pp_int oc n = Format.fprintf oc "%d" n in
Pretty_utils.pp_list ~pre:"%s" pp_int Format.std_formatter [ 1; 2; 3 ]
^^
Passing "%s" here should be forbidden, as this format expects a string and is not compatible with the default one for ~pre, namely "@[". So, although I agree there is a major soundness bug in the original implementation of formats, I don't think our use of them in Pretty_utils should be rejected outright. (Of course, this is only my layman understanding.)

@vicuna
Copy link
Author

vicuna commented May 15, 2014

Comment author: bobot

The fact that the type of the argument asked by "...%( %)..." depends of the position of %( %) is wrong. I think it has a correct type when it is in last position (result type is unit). We just have to figure out how to allow that at any position.

For sake of readability I simplify the format type to the argument and the result type ('arg,'result) format and the format gadt.

type ('arg,'result) format =
...
| Format_subst: ('b,'c) t -> (('a,'b) t -> 'a,'c) t
...

This type allows to concatenate the format. But it is not position independent. What do you think of this?

| Format_subst: ('a,'b,'c) proof * ('a,'c) format -> (('b,unit) format -> 'a,'c) format

where ('a,'b,'c) proof is a proof that
it exists t1, ..., tn such that
'a = t1 -> ... -> tn -> 'c
'b = t1 -> ... -> tn -> unit

The skeleton of the gadt definition of proof is similar to fmtty. A special concatenation with the following type should be implementable:

val concat: ('b,unit) format -> ('a,'b,'c) proof -> ('c,'d) format -> ('a,'d) format

I can write a toy formatter for a proof of concept.

@vicuna
Copy link
Author

vicuna commented May 15, 2014

Comment author: bvaugon

The problem is that the type inferred by the legacy implementation of
formats for the optionnal "pre" argument of the function pp_list is:

pre: ('a, 'b, 'c, 'd, 'd, 'a) format6

Contrary to what one might think, this type does not force the format
argument associated to "pre" to be a "constant format". In fact, the
"%s" format is alowed for pre because "%s" may be typed with:

"%s": ((string -> 'a) as 'a, 'b, 'c, 'd, 'd, 'a) format6

which is compatible with ('a, 'b, 'c, 'd, 'd, 'a) format6.

That's why the legacy implementation of formats allow to pass "%s" to
pp_list, which results in a seg-fault. Any other implementation of
formats (based on any variant of GADTs or other technique) that infer
the same type (('a, 'b, 'c, 'd, 'd, 'a) format6) for the "pre"
parameter will be incorrect because it will allow to pass the same "%s"
as argument.

The question is, Boris: what type would you like to have for your pre
parameter?

@vicuna
Copy link
Author

vicuna commented May 15, 2014

Comment author: @gasche

Bobot: Yes, your analysis is correct and this is the direction towards which I've started working on this (before the merge). See the very rough work in progress in my branch format-gadts-rel (for a "relational" version of fmtty):
https://github.com/gasche/ocaml/tree/format-gadts-rel

The problem is that things get quite complex when you actually implement the various functions that operate on formats. You need a generalized concatenation function, but also to prove/implement by hand inversion principles, and it quickly gets quite difficult.

My plan is to first simplify the fmtty datatype (factorize the simple argument-taking cases that only affect the 'a parameter) to make the implementation more manageable. In any case, so far I've focussed on fixing actual regressions rather than improving this. The main ones are now fixed, so I shall work on that this week-end.

@vicuna
Copy link
Author

vicuna commented May 15, 2014

Comment author: @gasche

Any other implementation of formats (based on any variant of GADTs
or other technique) that infer the same type
(('a, 'b, 'c, 'd, 'd, 'a) format6) for the "pre" parameter
will be incorrect because it will allow to pass the same "%s"
as argument.

An implementation could simply fail dynamically with an error message in that edge case that is only possible in presence of -rectypes and is clearly a misuse of formats. There are already a few dynamic failures in the trunk implementation, and it is perfectly acceptable to add a few more, as long as the code that users rely on and worked properly (for their use cases) with the previous implementation still works properly in the next release.

Benoît: I don't see the point of re-hashing this discussion that we've already had (privately) during my patch review. Either I manage to generalize the type of the format implementation (I would have appreciated help from you to do this but you're too busy trying to prove that the users are wrong), or I'll revert to an unsafe implementation of %(...%).

@vicuna
Copy link
Author

vicuna commented May 15, 2014

Comment author: bobot

Gabriel: I'm not sure what you mean by generalizing. I think it is more choosing another type (not greater or smaller than the current (GADT)).

Benoît: the type for pre should be (unit,...,unit), generally I think the result type of the argument of "...%( %)..." must be unit, forbidding a recursive type.

Gabriel: do you come up with a solution using a toy formatter? (just constant, string, and subst)? I will start with that.

@vicuna
Copy link
Author

vicuna commented May 15, 2014

Comment author: @mauny

you're too busy trying to prove that the users are wrong
I don't understand where you are going with such personal attacks.

@vicuna
Copy link
Author

vicuna commented May 15, 2014

Comment author: bvaugon

François: your idea to enforce the first and last type parameters
to be unit is interessting, and it could work for printf. But
these type parameters are here for polymorphism over other
printing/scanning functions (like sprintf for example), and for
concatenation. In fact, for sprintf, the last parameter is binded
to string. The problem is that declaration and typing of formats
are independent to usage, and it is "complicated" to know, at
compile time, in which context a format will be used. When a
format is used in different contexts, it is just impossible to
choose between unit, string, or some other for the last parameter.

@vicuna
Copy link
Author

vicuna commented May 15, 2014

Comment author: bobot

The type of the argument of %( ... %) doesn't need to depend of the context of application, it is an argument of the application. Here I think a sound and usable typing of the following functions:

let f x y = Printf.sprintf "%(%) %s %(%)" x "toto" y;;

val f :
(unit, 'b, 'c, 'd, 'd, unit) format6 ->
(unit, 'b, 'c, 'd, 'd, unit) format6 -> string =

let g x y = Printf.printf "%(%) %s %(%)" x "toto" y;;

val g :
(unit, 'b, 'c, 'd, 'd, unit) format6 ->
(unit, 'b, 'c, 'd, 'd, unit) format6 -> unit =

let h x y = Printf.sprintf "%( %s %) %(%)" x "toto" y;;

val f :
(string -> unit, 'b, 'c, 'd, 'd, unit) format6 ->
(unit, 'f, 'g, 'h, 'h, unit) format6 -> string =

@vicuna
Copy link
Author

vicuna commented May 15, 2014

Comment author: bvaugon

Yes, but what's about:

let f fmt =
ignore (Printf.sprintf "%(%)" fmt);
Printf.sprintf fmt

?

@vicuna
Copy link
Author

vicuna commented May 15, 2014

Comment author: bobot

It is not well typed ;). In the same way this function is not well-typed:

let f fmt =
ignore (Printf.printf fmt);
Printf.sprintf fmt

@vicuna
Copy link
Author

vicuna commented May 15, 2014

Comment author: bvaugon

It is not well typed ;)

Ah ok, it is not well typed with your implementation, but it is well typed with the legacy version and with my GADT-based version too.

@vicuna
Copy link
Author

vicuna commented May 15, 2014

Comment author: bobot

The legacy version accept unsafe terms, so we need to restrict the accepted terms. The perfect restriction accept all safe terms that were accepted before, and no unsafe terms. However usable type system never ensure that. So we have choice to make for the safe terms that are not accepted anymore. And doing so by breaking as few as possible legitimate use of the feature.

I think
fun sep -> Printf.sprintf "a %(%) b %(%) c" sep sep

is a legitimate and intended use of the feature which is not accepted by your GADT-based version.

@vicuna
Copy link
Author

vicuna commented May 15, 2014

Comment author: bvaugon

Ok, that's clear, it is a choice between programs on which we want to break compatibility and programs we want to preserve. It seems to be difficult to have the best of the two worlds. As you told, this kind of code pattern:

fun sep -> Printf.sprintf "a %(%) b %(%) c" sep sep

may be used more frequently than:

let f fmt =
ignore (Printf.sprintf "%(%)" fmt);
Printf.sprintf fmt

So, it may be interesting to try this approach.

@vicuna
Copy link
Author

vicuna commented May 15, 2014

Comment author: @yakobowski

@bvaugon ( ~11468 )

The question is, Boris: what type would you like to have for your pre
parameter?

I'm not interested in typing issues, I'm only a user :-p

More seriously, my point is the following. There is a specification of format strings, given by the documentation of the Printf, Format and Scanf modules. The real question is, is it possible to safely type all the formats described by those modules? Asking "what type would you give to this program" when the program is a legitimate use of formats is taking the problem backwards. Instead, we should focus on which programs should be accepted, and on finding a type system that accepts them -- and no unsafe programs. (Of course, if no sane typing system exists, then proper restrictions for the use of %(%) will have to be found, and the documentation will have to be adapted [#] :-( )

Furthermore, if I may weigh in the design space, I think that severely limiting a useful feature ( %(%) ) for an unsoundness that can only appear because of an obscure option (-rectypes) is a bad choice. Coming from a lambda-calculus background, I can see the appeal of having -rectypes available. On the other hand, it is a pain to program with this option active (too many programs become typable), and there are other ways in the language to get recursive types.

[#] Your changes to our module Pretty_utils show that the current typing rule is not that intuitive, as it is not clear what you can possibly write after a "%(%)".

@vicuna
Copy link
Author

vicuna commented May 16, 2014

Comment author: bobot

fmt_location_independent.ml:
It is a toy implementation with GADT of %s, %(%) and constant text that gives to %(%) a position independent type. The type proof is a corrected version of the one I proposed above. There is two print functions one that use a specific concatenation (print) and one that use continuation (direct_print). The hardest part was to accept "%( %( %) %)".

(**
proof of:
'a = t1 -> ... -> tn -> 'c
'b = t1 -> ... -> tn -> unit
*)
type ('a,'b,'c) proof =
| PNil : ('a,unit,'a) proof
| PSubst : ('a,'b,'c) proof * ('f,'b,'d) proof * ('c,'d,'e) proof ->
(('b,unit) t -> 'a, ('b,unit) t -> 'f, 'e) proof
| PString : ('a,'b,'c) proof -> (string -> 'a,string -> 'b,'c) proof

and ('a,'c) t =
| Nil : (unit,unit) t
| Constant: string * ('a,'c) t -> ('a,'c) t
| Subst: ('a,'b,'c) proof * ('c,'d) t -> (('b,unit) t -> 'a,'d) t
| String : ('a,'c) t -> (string -> 'a,'c) t

Nb: proving properties using gadt in ocaml is currently, 4.01, a pain since you can't name the existential variable easily in match case. Is it for 4.02?

@vicuna
Copy link
Author

vicuna commented May 18, 2014

Comment author: @gasche

After a productive discussion with François Bobot and some more work, I am convinced that the fmtty_rel approach is the right one: now I know that it works. I'll be working with François on massaging it into something mergeable, but I can already claim that the code posted by Boris will be supported, unchanged, in 4.02.

@vicuna
Copy link
Author

vicuna commented May 21, 2014

Comment author: @gasche

A preliminary fix was applied in trunk and 4.02 -- it's unfortunately not part of the -beta1 release which was tagged yesterday. Boris code now compiles, and hopefully it also does the right thing!

Boris, can you confirm that %(..%)-related regressions are gone from the Frama-C codebase?

@vicuna
Copy link
Author

vicuna commented May 21, 2014

Comment author: @yakobowski

Good news, the original problem is fixed! (But you already knew that.)

I'm however afraid that there are some bad news, this time related to '@{'. I'm trying to come up with a minimal example, and to see how widespread it is. Anyway, there are a finite number of punctuation sign that can appear after '%' and '@' so this should ultimately work :-D

@vicuna
Copy link
Author

vicuna commented May 21, 2014

Comment author: @yakobowski

OCaml version 4.02.0+dev2-2013-09-12

Format.printf "@{@}";;

  • : string -> unit =

      OCaml version 4.03.0+dev0-2014-05-12
    

Format.printf "@{@}";;

  • : unit = ()

@vicuna
Copy link
Author

vicuna commented May 21, 2014

Comment author: @gasche

This is #6418; I'm working on a (much smaller) fix.

@vicuna vicuna added the bug label Mar 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants