Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0006421OCamlOCaml typingpublic2014-05-13 16:492014-05-21 16:49
ReporterBoris Yakobowski 
Assigned Togasche 
PlatformOSOS Version
Product Version 
Target VersionFixed in Version4.02.0+beta1 
Summary0006421: Format: %(...%)-heavy code no longer compiles with trunk
Description(This code is directly extracted from Frama-C)

The code in the attached filed can no longer be compiled using trunk, probably because of the new GADT-based representation for format strings. The trouble arises in functions that take formats as arguments, before substituting them using %(%). Unfortunately, we make an heavy use of those.
TagsNo tags attached.
Attached Files? file icon [^] (5,451 bytes) 2014-05-13 16:49 [Show Content]
? file icon [^] (5,483 bytes) 2014-05-14 16:37 [Show Content]
? file icon [^] (2,326 bytes) 2014-05-16 09:38 [Show Content]

- Relationships

-  Notes
gasche (developer)
2014-05-13 16:55

The typing issue with %(..%) is known. The plan is to fix it soon-ish: if we can't fix the gadt-based version, I will have to revert it (long enough) before any release candidate.

I planned to write some documentation to let people know, but I didn't expect to be flooded with reports just the day after the version branching!
bvaugon (reporter)
2014-05-14 16:37

This incompatibility problem is due to the (involuntary) usage of
a bug from the legacy version of formats. In fact, your attached file is incorrect and may be used to produce a segmentation
fault. For exemple, if you write a simple other file
containing the following two lines:

let pp_int oc n = Format.fprintf oc "%d" n in
Pretty_utils.pp_list ~pre:"%s" pp_int Format.std_formatter [ 1; 2; 3 ]

and compile it separately with your file using the
legacy version of the compiler:

$ ocamlc -c
$ ocamlc -c -rectypes # warning: -rectypes
$ ocamlc pretty_utils.cmo main.cmo -o p

You obtain a sympathetic behavior:
$ ./p
Segmentation fault

This bug have been fixed in the new version, that's why is now rejected with a typing error (that is
unreadable, I confess). In fact, to prevent this kind of error, the
new inferred type for the %(%) construction is much more complex than
the old wrong-one, and is in particular impacted by format
constructions that follows it. It is then (unfortunately) sometimes
hard to consider error messages on formats using "%(%)", and to fix
codes using weaknesses of the legacy format implementation.

I attach a fixed version of, that compiles with the
old and new versions of formats. Of course, it prevents (at typing
time) invalid usages like above, but only with the new implementation
of formats since the old-one is broken. It consists in deletions of
usage of %(%), that is regrettable for lisibility but necessary for
gasche (developer)
2014-05-14 16:55

I don't completely agree with Benoît's interpretation above.

The following things are clear:
1. There was a soundness bug in the legacy implementation of format, which is due to the author of the code making the same assumption that is broken in PR#6405
2. The trunk implementation (Benoît's) is free of such soundness bugs (it doesn't use Obj.magic), so we know for sure that anything it accepts is correct
3. The current implementation of trunk has a weaker typing of %(...%) that makes them unusable in a lot of cases
4. We are not yet sure that it is possible to tweak the trunk implementation to give %(..%) a richer type that would accept Boris' code. Doing this change is what I called "fix [the typing issue]" in my message above. While minor in comparison to the overall surface of a format implementation, it is a difficult change to make.

We know for sure that the version of the code proposed by Benoît above is "correct" (in the sense that it can't be used to break soundness), as it is accepted by the current implementation. We disagree on whether Boris' code itself is correct: my *opinion* (and the one of the code's author) is that it is correct (it can be accepted as expected by a sound implementation). Generalizing the typing of %(..%) would let us be sure of that.
bobot (reporter)
2014-05-14 18:10

I wonder what is the semantic and the intent with "...%(%)...". I always though that only 'constant' format6 can be used. So %(%) wait the same type than "".
gasche (developer)
2014-05-14 18:13

Indeed: the point is that you can pass formatting annotations (@. etc.), so that is not at all equivalent to requiring %s. It's reasonable to pass, for example, a pretty-printing line break as the separator of a list to pretty-print.
mauny (developer)
2014-05-14 21:42

@gasche> We are not yet sure that it is possible to tweak the trunk implementation to give %(..%) a richer type that would accept Boris' code.

Well, the fact is that Boris' code is buggy (unsound), as shown by
@bvaugon in his message: calling Boris' function segfaults. It's hard to
understand why you want this code to typecheck!
Boris Yakobowski (reporter)
2014-05-14 22:44

I'm still a bit lost on this issue. For me, our code is not buggy (of course :-) ). The real bug appears in Benoît's example:

> let pp_int oc n = Format.fprintf oc "%d" n in
> Pretty_utils.pp_list ~pre:"%s" pp_int Format.std_formatter [ 1; 2; 3 ]
Passing "%s" here should be forbidden, as this format expects a string and is not compatible with the default one for ~pre, namely "@[". So, although I agree there is a major soundness bug in the original implementation of formats, I don't think our use of them in Pretty_utils should be rejected outright. (Of course, this is only my layman understanding.)
bobot (reporter)
2014-05-15 09:15
edited on: 2014-05-15 09:17

The fact that the type of the argument asked by "...%( %)..." depends of the position of %( %) is wrong. I think it has a correct type when it is in last position (result type is unit). We just have to figure out how to allow that at any position.

For sake of readability I simplify the format type to the argument and the result type ('arg,'result) format and the format gadt.

type ('arg,'result) format =
| Format_subst: ('b,'c) t -> (('a,'b) t -> 'a,'c) t

This type allows to concatenate the format. But it is not position independent. What do you think of this?

| Format_subst: ('a,'b,'c) proof * ('a,'c) format -> (('b,unit) format -> 'a,'c) format

where ('a,'b,'c) proof is a proof that
  it exists t1, ..., tn such that
    'a = t1 -> ... -> tn -> 'c
    'b = t1 -> ... -> tn -> unit

The skeleton of the gadt definition of `proof` is similar to `fmtty`. A special concatenation with the following type should be implementable:

  val concat: ('b,unit) format -> ('a,'b,'c) proof -> ('c,'d) format -> ('a,'d) format

I can write a toy formatter for a proof of concept.

bvaugon (reporter)
2014-05-15 10:08

The problem is that the type inferred by the legacy implementation of
formats for the optionnal "pre" argument of the function pp_list is:

pre: ('a, 'b, 'c, 'd, 'd, 'a) format6

Contrary to what one might think, this type does not force the format
argument associated to "pre" to be a "constant format". In fact, the
"%s" format is alowed for pre because "%s" may be typed with:

"%s": ((string -> 'a) as 'a, 'b, 'c, 'd, 'd, 'a) format6

which is compatible with ('a, 'b, 'c, 'd, 'd, 'a) format6.

That's why the legacy implementation of formats allow to pass "%s" to
pp_list, which results in a seg-fault. Any other implementation of
formats (based on any variant of GADTs or other technique) that infer
the same type (('a, 'b, 'c, 'd, 'd, 'a) format6) for the "pre"
parameter will be incorrect because it will allow to pass the same "%s"
as argument.

The question is, Boris: what type would you like to have for your pre
gasche (developer)
2014-05-15 10:09
edited on: 2014-05-15 10:09

Bobot: Yes, your analysis is correct and this is the direction towards which I've started working on this (before the merge). See the very rough work in progress in my branch format-gadts-rel (for a "relational" version of fmtty): [^]

The problem is that things get quite complex when you actually implement the various functions that operate on formats. You need a generalized concatenation function, but also to prove/implement by hand inversion principles, and it quickly gets quite difficult.

My plan is to first simplify the fmtty datatype (factorize the simple argument-taking cases that only affect the 'a parameter) to make the implementation more manageable. In any case, so far I've focussed on fixing actual regressions rather than improving this. The main ones are now fixed, so I shall work on that this week-end.

gasche (developer)
2014-05-15 10:16

> Any other implementation of formats (based on any variant of GADTs
> or other technique) that infer the same type
> (('a, 'b, 'c, 'd, 'd, 'a) format6) for the "pre" parameter
> will be incorrect because it will allow to pass the same "%s"
> as argument.

An implementation could simply fail dynamically with an error message in that edge case that is only possible in presence of -rectypes and is clearly a misuse of formats. There are already a few dynamic failures in the trunk implementation, and it is perfectly acceptable to add a few more, as long as the code that users rely on and worked properly (for their use cases) with the previous implementation still works properly in the next release.

Benoît: I don't see the point of re-hashing this discussion that we've already had (privately) during my patch review. Either I manage to generalize the type of the format implementation (I would have appreciated help from you to do this but you're too busy trying to prove that the users are wrong), or I'll revert to an unsafe implementation of %(...%).
bobot (reporter)
2014-05-15 10:30

Gabriel: I'm not sure what you mean by generalizing. I think it is more choosing another type (not greater or smaller than the current (GADT)).

Benoît: the type for pre should be (unit,...,unit), generally I think the result type of the argument of "...%( %)..." must be unit, forbidding a recursive type.

Gabriel: do you come up with a solution using a toy formatter? (just constant, string, and subst)? I will start with that.
mauny (developer)
2014-05-15 10:43

> you're too busy trying to prove that the users are wrong
I don't understand where you are going with such personal attacks.
bvaugon (reporter)
2014-05-15 11:47

François: your idea to enforce the first and last type parameters
to be unit is interessting, and it could work for printf. But
these type parameters are here for polymorphism over other
printing/scanning functions (like sprintf for example), and for
concatenation. In fact, for sprintf, the last parameter is binded
to string. The problem is that declaration and typing of formats
are independent to usage, and it is "complicated" to know, at
compile time, in which context a format will be used. When a
format is used in different contexts, it is just impossible to
choose between unit, string, or some other for the last parameter.
bobot (reporter)
2014-05-15 12:08

The type of the argument of %( ... %) doesn't need to depend of the context of application, it is an argument of the application. Here I think a sound and usable typing of the following functions:

let f x y = Printf.sprintf "%(%) %s %(%)" x "toto" y;;

val f :
  (unit, 'b, 'c, 'd, 'd, unit) format6 ->
  (unit, 'b, 'c, 'd, 'd, unit) format6 -> string = <fun>

let g x y = Printf.printf "%(%) %s %(%)" x "toto" y;;

val g :
  (unit, 'b, 'c, 'd, 'd, unit) format6 ->
  (unit, 'b, 'c, 'd, 'd, unit) format6 -> unit = <fun>

let h x y = Printf.sprintf "%( %s %) %(%)" x "toto" y;;

val f :
  (string -> unit, 'b, 'c, 'd, 'd, unit) format6 ->
  (unit, 'f, 'g, 'h, 'h, unit) format6 -> string = <fun>
bvaugon (reporter)
2014-05-15 12:20
edited on: 2014-05-15 12:24

Yes, but what's about:

let f fmt =
  ignore (Printf.sprintf "%(%)" fmt);
  Printf.sprintf fmt


bobot (reporter)
2014-05-15 12:35

It is not well typed ;). In the same way this function is not well-typed:

let f fmt =
  ignore (Printf.printf fmt);
  Printf.sprintf fmt
bvaugon (reporter)
2014-05-15 12:38

> It is not well typed ;)

Ah ok, it is not well typed with your implementation, but it is well typed with the legacy version and with my GADT-based version too.
bobot (reporter)
2014-05-15 12:58

The legacy version accept unsafe terms, so we need to restrict the accepted terms. The perfect restriction accept all safe terms that were accepted before, and no unsafe terms. However usable type system never ensure that. So we have choice to make for the safe terms that are not accepted anymore. And doing so by breaking as few as possible legitimate use of the feature.

I think
 fun sep -> Printf.sprintf "a %(%) b %(%) c" sep sep

is a legitimate and intended use of the feature which is not accepted by your GADT-based version.
bvaugon (reporter)
2014-05-15 14:38

Ok, that's clear, it is a choice between programs on which we want to break compatibility and programs we want to preserve. It seems to be difficult to have the best of the two worlds. As you told, this kind of code pattern:

 fun sep -> Printf.sprintf "a %(%) b %(%) c" sep sep

may be used more frequently than:

let f fmt =
  ignore (Printf.sprintf "%(%)" fmt);
  Printf.sprintf fmt

So, it may be interesting to try this approach.
Boris Yakobowski (reporter)
2014-05-15 15:51

@bvaugon ( 0006421:0011468 )

> The question is, Boris: what type would you like to have for your pre
> parameter?

I'm not interested in typing issues, I'm only a user :-p

More seriously, my point is the following. There is a specification of format strings, given by the documentation of the Printf, Format and Scanf modules. The real question is, is it possible to safely type all the formats described by those modules? Asking "what type would you give to this program" when the program is a legitimate use of formats is taking the problem backwards. Instead, we should focus on which programs should be accepted, and on finding a type system that accepts them -- and no unsafe programs. (Of course, if no sane typing system exists, then proper restrictions for the use of %(%) will have to be found, and the documentation will have to be adapted [#] :-( )

Furthermore, if I may weigh in the design space, I think that severely limiting a useful feature ( %(%) ) for an unsoundness that can only appear because of an obscure option (-rectypes) is a bad choice. Coming from a lambda-calculus background, I can see the appeal of having -rectypes available. On the other hand, it is a pain to program with this option active (too many programs become typable), and there are other ways in the language to get recursive types.

[#] Your changes to our module Pretty_utils show that the current typing rule is not that intuitive, as it is not clear what you can possibly write after a "%(%)".
bobot (reporter)
2014-05-16 09:47
  It is a toy implementation with GADT of %s, %(%) and constant text that gives to %(%) a position independent type. The type proof is a corrected version of the one I proposed above. There is two print functions one that use a specific concatenation (print) and one that use continuation (direct_print). The hardest part was to accept "%( %( %) %)".

    proof of:
    'a = t1 -> ... -> tn -> 'c
    'b = t1 -> ... -> tn -> unit
type ('a,'b,'c) proof =
| PNil : ('a,unit,'a) proof
| PSubst : ('a,'b,'c) proof * ('f,'b,'d) proof * ('c,'d,'e) proof ->
  (('b,unit) t -> 'a, ('b,unit) t -> 'f, 'e) proof
| PString : ('a,'b,'c) proof -> (string -> 'a,string -> 'b,'c) proof

and ('a,'c) t =
| Nil : (unit,unit) t
| Constant: string * ('a,'c) t -> ('a,'c) t
| Subst: ('a,'b,'c) proof * ('c,'d) t -> (('b,unit) t -> 'a,'d) t
| String : ('a,'c) t -> (string -> 'a,'c) t

Nb: proving properties using gadt in ocaml is currently, 4.01, a pain since you can't name the existential variable easily in match case. Is it for 4.02?
gasche (developer)
2014-05-18 11:58
edited on: 2014-05-18 11:59

After a productive discussion with François Bobot and some more work, I am convinced that the fmtty_rel approach is the right one: now I *know* that it works. I'll be working with François on massaging it into something mergeable, but I can already claim that the code posted by Boris will be supported, unchanged, in 4.02.

gasche (developer)
2014-05-21 15:27

A preliminary fix was applied in trunk and 4.02 -- it's unfortunately not part of the -beta1 release which was tagged yesterday. Boris code now compiles, and hopefully it also does the right thing!

Boris, can you confirm that %(..%)-related regressions are gone from the Frama-C codebase?
Boris Yakobowski (reporter)
2014-05-21 16:30
edited on: 2014-05-21 16:35

Good news, the original problem is fixed! (But you already knew that.)

I'm however afraid that there are some bad news, this time related to '@{'. I'm trying to come up with a minimal example, and to see how widespread it is. Anyway, there are a finite number of punctuation sign that can appear after '%' and '@' so this should ultimately work :-D

Boris Yakobowski (reporter)
2014-05-21 16:32

OCaml version 4.02.0+dev2-2013-09-12

# Format.printf "@{<td class=%s>@}";;
- : string -> unit = <fun>

        OCaml version 4.03.0+dev0-2014-05-12

# Format.printf "@{<td class=%s>@}";;
- : unit = ()
gasche (developer)
2014-05-21 16:38
edited on: 2014-05-21 16:38

This is PR#6418; I'm working on a (much smaller) fix.

- Issue History
Date Modified Username Field Change
2014-05-13 16:49 Boris Yakobowski New Issue
2014-05-13 16:49 Boris Yakobowski File Added:
2014-05-13 16:55 gasche Note Added: 0011443
2014-05-13 16:55 gasche Assigned To => gasche
2014-05-13 16:55 gasche Status new => assigned
2014-05-14 16:37 bvaugon Note Added: 0011458
2014-05-14 16:37 bvaugon File Added:
2014-05-14 16:55 gasche Note Added: 0011460
2014-05-14 18:10 bobot Note Added: 0011465
2014-05-14 18:13 gasche Note Added: 0011466
2014-05-14 21:42 mauny Note Added: 0011467
2014-05-14 22:44 Boris Yakobowski Note Added: 0011468
2014-05-15 09:15 bobot Note Added: 0011472
2014-05-15 09:17 bobot Note Edited: 0011472 View Revisions
2014-05-15 10:08 bvaugon Note Added: 0011474
2014-05-15 10:09 gasche Note Added: 0011475
2014-05-15 10:09 gasche Note Edited: 0011475 View Revisions
2014-05-15 10:16 gasche Note Added: 0011476
2014-05-15 10:30 bobot Note Added: 0011477
2014-05-15 10:43 mauny Note Added: 0011478
2014-05-15 11:47 bvaugon Note Added: 0011480
2014-05-15 12:08 bobot Note Added: 0011481
2014-05-15 12:20 bvaugon Note Added: 0011482
2014-05-15 12:24 bvaugon Note Edited: 0011482 View Revisions
2014-05-15 12:35 bobot Note Added: 0011483
2014-05-15 12:38 bvaugon Note Added: 0011485
2014-05-15 12:58 bobot Note Added: 0011486
2014-05-15 14:38 bvaugon Note Added: 0011487
2014-05-15 15:51 Boris Yakobowski Note Added: 0011488
2014-05-16 09:38 bobot File Added:
2014-05-16 09:47 bobot Note Added: 0011489
2014-05-18 11:58 gasche Note Added: 0011515
2014-05-18 11:59 gasche Note Edited: 0011515 View Revisions
2014-05-21 15:27 gasche Note Added: 0011528
2014-05-21 16:30 Boris Yakobowski Note Added: 0011530
2014-05-21 16:32 Boris Yakobowski Note Added: 0011532
2014-05-21 16:35 Boris Yakobowski Note Edited: 0011530 View Revisions
2014-05-21 16:38 gasche Note Added: 0011533
2014-05-21 16:38 gasche Note Edited: 0011533 View Revisions
2014-05-21 16:39 gasche Status assigned => resolved
2014-05-21 16:39 gasche Fixed in Version => 4.02.0+beta1
2014-05-21 16:39 gasche Resolution open => fixed
2014-05-21 16:49 gasche Summary Format-heavy code no longer compiles with trunk => Format: %(...%)-heavy code no longer compiles with trunk

Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker