Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signatures as ppx payload #6681

Closed
vicuna opened this issue Nov 28, 2014 · 27 comments
Closed

Signatures as ppx payload #6681

vicuna opened this issue Nov 28, 2014 · 27 comments
Assignees
Labels
Milestone

Comments

@vicuna
Copy link

vicuna commented Nov 28, 2014

Original bug ID: 6681
Reporter: @Drup
Assigned to: @alainfrisch
Status: closed (set by @xavierleroy on 2017-02-16T14:18:19Z)
Resolution: fixed
Priority: normal
Severity: minor
Version: 4.02.1
Target version: 4.03.0+dev / +beta1
Fixed in version: 4.03.0+dev / +beta1
Category: ~DO NOT USE (was: OCaml general)
Related to: #6688
Monitored by: @Drup @hcarty

Bug description

Currently, the payload of an extension point is either

  • a structure [%foo ]
    This case also covers expressions
  • a type [%foo: ]
  • a pattern [%foo? ]

I would like to be able to have signatures as a payload too. Gabriel Scherer proposed the syntax [%foo sig end ].

The use case I have in mind is to create sections in a .mli, in particular in eliom.

[%%client sig
val foo : ...;
end ]

and the shorter version:
val%client foo : ....

Some people argued that a syntax [%%client start] is better, but it doesn't allow for the shortcut syntax, which I really would like to have.

As far as I can tell, this is the only last piece of syntax that is not valid as a payload (you can manage to put most of the other syntaxes, one way or another).

@vicuna
Copy link
Author

vicuna commented Dec 3, 2014

Comment author: @alainfrisch

Another direction would be to allow in the parser (and Parsetree) to have "val" item in structures. They would be rejected by the type-checker for now, although there are some interesting ways to use them in structures (as a nice way to specify the expected type for a function, or perhaps to support local forward declarations). Or do you have other kinds of signature items that are not also structure items that you'd like to use?

For now, you could also turn "val" into "external":

[%%client external foo : .... = ""]

@vicuna
Copy link
Author

vicuna commented Dec 3, 2014

Comment author: @Drup

.eliomi are .mli where you can put client/shared/server annotations. I would like that anything valid in a .mli be valid in a .eliomi, annotated.

I'm pretty sure other people will have the same need (add %foo annotations on signature items).

I don't see adding val to structures as a valid solution for this problem.

@vicuna
Copy link
Author

vicuna commented Dec 3, 2014

Comment author: @lpw25

I don't see adding val to structures as a valid solution for this problem.

But it does add support for your suggested short-cut version.

@vicuna
Copy link
Author

vicuna commented Dec 3, 2014

Comment author: @Drup

Only because I gave a simple example. It wouldn't help as soon we want other signature items:

[%client
module M : .....
]

(module%foo is not possible currently, unfortunately)

@vicuna
Copy link
Author

vicuna commented Dec 3, 2014

Comment author: @alainfrisch

If you don't use class-based features and with the addition of "val" to structures, you could mostly translate all signatures to structures:

[%%client module M : sig type t val x : .... end]

would be written:

[%%client module M = struct type t val x : .... end]

Alternatively, one could also consider adding "module M : S" to structures directly, but the case for it is less clear than for "val".

@vicuna
Copy link
Author

vicuna commented Dec 3, 2014

Comment author: @alainfrisch

.eliomi are .mli where you can put client/shared/server annotations. I would like that anything valid in a .mli be valid in a .eliomi, annotated.

Did you consider using attributes instead of extension nodes:

module M : ...
[@@client]

val foo : ..
[@@server]

?

@vicuna
Copy link
Author

vicuna commented Dec 3, 2014

Comment author: @lpw25

Only because I gave a simple example. It wouldn't help as soon we want other signature items:

Sorry, I was thinking that the proposal was for all signature items to be valid as structure items, but it is indeed only vals which are proposed.

@vicuna
Copy link
Author

vicuna commented Dec 3, 2014

Comment author: @Drup

frisch: I could do that but:

  1. It doesn't fit the semantic of attributes vs extensions.
  2. It doesn't allow to annotate several signature items at once, and I would like to allow both individual and grouped annotations (which is perfectly possible with extension nodes on structures).
  3. The compiler will not reject the annotation if uninterpreted, see 1).

This seems to me as a logical extension of how extension nodes behave on structures, It is the same behavior, but available on signatures (and without workaround syntax which main purpose will be to confuse the user).

@vicuna
Copy link
Author

vicuna commented Dec 4, 2014

Comment author: @alainfrisch

First, note that I'm not opposed in principle to adding signatures as a new valid form of attribute/extension payload. But the proposed syntax [%%client sig ... end] seems a bit heavy, and I'm trying to see if other solutions couldn't be preferred. Considering the large overlap between the syntax of structure and signature items, I've a natural tendency to try bringing the two syntactic classes closer to each other. Supporting both as payload will probably lead to cases where users write [%%client type t] instead of [%%client sig type t end]; and then you'll feel some pressure to support both forms in your ppx for the intersection of the two syntactic categories, leading to code duplication.

(For the same reason, I've also tried to merge expressions and patterns, but failed to do so; that just introduced too many conflicts in the grammar.)

  1. It doesn't fit the semantic of attributes vs extensions.

That's a matter of interpretation. It seems to me that a signature item under a %client is interpreted in the normal way (when not filtered out) by the compiler. It has never been said that attributes don't change the (static or dynamic) meaning of a program: typically, attributes used to trigger deriving-like code generation adds more declarations to the actual code, which of course affects its meaning. In your case, if I understand correctly, attributes would be used to drop items, but not really change their meaning.

Extension nodes are quite different: they are supposed to embed a "foreign" sub-language which happens to share the same concrete syntax as a fragment of OCaml, or to extend OCaml with "new" local features. Typical uses of extensions would be to define lexers/parsers, embed "data query sublanguages", non-standard forms of let-binding or pattern matching, etc.

But assuming you really don't want to use attributes, can we try to see what it would take to stick to structures as payload? I take your point about avoiding syntactic workaround for 'module X : S'. It's more intrusive than the proposed change for 'val', but it's not very difficult to support them as structure items as well. If we want to avoid extending the Parsetree, it's pretty easy to do by adding the following rule to module_binding_body in parser.mly:

  | COLON module_type
      {
        let abstr = ghmod(Pmod_extension (mkloc "MISSING" (symbol_rloc()) , PStr[])) in
        mkmod(Pmod_constraint(abstr, $2))
      }

or:

  | COLON module_type
      {
        let ext =
          Ast_mapper.extension_of_error
            (error ~loc:(symbol_rloc ())
               "Error: module declarations are not allowed in structures (a module expression must be provided)")
        in
        let abstr = ghmod(Pmod_extension ext) in
        mkmod(Pmod_constraint(abstr, $2))
      }

i.e. considering:

module X : S

in structures to be equivalent to:

module X : S = [%...]

(And same for classes.)

(A more ambitious approach would be to really merge structure and signature items both in the parser and in the Parsetree. This would reduce significantly code duplication in the parser and all syntactic boilerplate code (Ast_helper, Ast_mapper, printast), while adding not too much noise in the typechecker, and providing better error message when users use structure items in signatures or the opposite.)

@vicuna
Copy link
Author

vicuna commented Dec 4, 2014

Comment author: @Drup

I agree that the syntax [%%foo sig ... end ] is not optimal.

My initial idea was to say "if the extension node is in a structure, then the payload is a structure, if it's in a signature, the payload is a signature" but gabriel pointed out several issues with that, and I tend to agree that it's a bad solution too.

Your proposition for module signature in the parsetree means that a ppx that want to output a signature for it will have to transform the structure. It seems a bit adhoc and unnecessary to me. I suppose it would probably be ok if the transformation is provided in compiler-libs.

I actually like the "more ambitious" approach. Would the core team be interested by such a change ?

@vicuna
Copy link
Author

vicuna commented Dec 4, 2014

Comment author: @alainfrisch

I actually like the "more ambitious" approach. Would the core team be interested by such a change ?

Warning: some heavy discussions to be expected!

Technically, it seems the most difficult part of merging signatures and structures at the syntactic level comes from the "include" statement. It would force to merge module_expr and module_type as well, although they are quite different.

@vicuna
Copy link
Author

vicuna commented Dec 4, 2014

Comment author: @lpw25

Technically, it seems the most difficult part of merging signatures and structures at the syntactic level comes from the "include" statement. It would force to merge module_expr and module_type as well, although they are quite different.

How about a slightly less ambitious approach where structures and signatures share most of their items, but not everything? This could also include merging the syntactic descriptions of:

let open ... in
let module ... in
(* and in the future
let type ... in
let exception ... in
let class ... in
etc. *)

with the structure and signature versions.

@vicuna
Copy link
Author

vicuna commented Dec 5, 2014

Comment author: @alainfrisch

How about a slightly less ambitious approach where structures and signatures share most of their items, but not everything?

I've given it a try in the merge_sig_str SVN branch: merge the definitions of signature/structure item in the Parsetree definition (which already allows to share some boilerplate code), and be more liberal in the parser (which also removes some duplication). Actually, the only remaining differences in the parser between structures and signatures are:

  • top-level expressions are only allowed in structures
  • include is parsed differently in structures and in signatures

and the rest is treated by the type-checker (i.e. reject signature-only items in struct and structure-only items in signatures, with special care to reject exception/extension constructor rebinding in signatures and non-primitive value declarations in structure). Module definitions (module X = ME) and declarations (module X : MT) are represented differently (resp. with the Pstr_module and Psig_module constructors, which now belong to the same type), contrary to a previously mentioned hack. Module aliases in signatures (module X = Y) are represented as module definitions and interpreted as declaration when type-checking signatures (one should probably go one step further and remove the Pmty_alias constructor altogether, treating module X = Y directly in the type-checker).

I don't know why, but merging class definitions and class declarations introduced some conflicts (on SHARP) in the grammar. I'm confident that someone more fluent than me with yacc could fix that.

Waiting for some feedback before daring to officially propose that for inclusion...

@vicuna
Copy link
Author

vicuna commented Dec 6, 2014

Comment author: @Drup

I tracked down the reduce/reduce conflict. It's a combination of ... interesting things.

  1. When reading a file with #use, it's authorized not to put ";;" at the end of lines and to use directives starting by "#".
  2. In type expressions "#foo" (and " #foo") is authorized and means "[< foo]" (resp "[< foo]"). It was declared deprecated 13 years ago in this commit: 42d1811#diff-283538db7c320a41442d987f0c880fc5R166

The following line, when read using #use, is such ambiguous in LR(1):
"class t : foo #bar -> ...."

The following line is equally ambiguous
"type t = foo #bar"
and is resolved in favor of considering (foo #bar) as the whole type.

It's not obvious to me how to remove the ambiguity without making the grammar more complex. Removing the deprecated feature solves the problem, though, but I don't know if it's acceptable or not.

@vicuna
Copy link
Author

vicuna commented Dec 7, 2014

Comment author: @lpw25

#foo is not just for variants (which is deprecated), but also for objects (which is very much not deprecated).

@vicuna
Copy link
Author

vicuna commented Dec 7, 2014

Comment author: @lpw25

The conflict would probably be best avoided by applying the same rules for directives in files as those used for toplevel expressions (i.e. often must be preceeded by ;;).

This is not backwards compatible but might be acceptable.

@vicuna
Copy link
Author

vicuna commented Dec 7, 2014

Comment author: @Drup

Ah, indeed, I didn't know this one, thanks! If I understand correctly, it transforms a class into the relevant open object type.

@vicuna
Copy link
Author

vicuna commented Dec 7, 2014

Comment author: @alainfrisch

Do you guys understand why the conflict does not appear in the current OCaml grammar, but shows up after the merger of class definitions/declarations rules?

@vicuna
Copy link
Author

vicuna commented Dec 7, 2014

Comment author: @lpw25

The conflict is because when parsing a class type, the following two forms are allowed:

class_type

type -> class_type

which means we might be parsing a class type or we might be parsing a type. These two forms overlap in two ways:

  1. Either could be an identifier
  2. Either could be an extension node

Previously these two forms were disambiguated by what could follow them. Class types could only be followed by a signature item (type, let, module etc.), whereas types could not possibly be followed by a signature item.

With the merge of signature items and structure items, they could now both be followed by #. In the case of a class type this would be the start of a toplevel directive, in the case of a regular type this would be a #foo object type.

So it is now not possible (in LR(1)) to disambiguate:

class foo : bar
#directive

and

class foo : bar #directive -> baz

at the point where we have shifted bar and have # as our lookahead token.

@vicuna
Copy link
Author

vicuna commented Dec 7, 2014

Comment author: @lpw25

The conflict would probably be best avoided by applying the same rules for directives in files as those used for toplevel expressions (i.e. often must be preceeded by ;;).

I think this would be the best solution. Top-level directives are rarely mixed with normal definitions, and I conjecture that when they are they are placed at the top of the file rather than mixed in amongst the normal definitions.

Perhaps a scan over OPAM and a survey on the caml-list could verify this conjecture?

@vicuna
Copy link
Author

vicuna commented Dec 7, 2014

Comment author: @Drup

To add to what leo said: Having # as a lookahead token in this position is only possible with the parsing rules for #use, which could only parse structure items previously, not signature items.

@vicuna
Copy link
Author

vicuna commented Dec 8, 2014

Comment author: @alainfrisch

I just realized that "class type" definitions can only refer to class_signatures, not arbitrary class_types, which prevents the conflict on:

class type foo = bar #directive -> baz

Do you know if there is a deep reason to reject such definitions? If not, it would give an extra argument to drop the conflict on such items, which would also benefit to "class" declarations.

@vicuna
Copy link
Author

vicuna commented Dec 8, 2014

Comment author: @lpw25

Do you know if there is a deep reason to reject such definitions?

I don't know how deep the reason is, but it matches the semantics of class definitions.

If I define a class foo:

class foo x = object method m = x + 1 end

then I can use foo as a class type:

class bar : foo = object method m = 3 method private n = 4 end

This doesn't force bar to have the same full class type as foo, only to have the same class signature.

I think that supporting class types like:

class type baz = int -> object method m : int end

would require us to change the above behaviour for consistency (which is obviously not backwards compatible).

@vicuna
Copy link
Author

vicuna commented Oct 19, 2015

Comment author: @Drup

Where are we on this one ? Do we still have unresolved grammar issues ?

@vicuna
Copy link
Author

vicuna commented Oct 20, 2015

Comment author: @alainfrisch

I don't think there has been any progress on this, so the conflict remains in the branch. Also, synchronizing with trunk would take some effort given the amount of changes/refactoring that went into the parser recently. More importantly, this hasn't really been discussed among maintainers (and the proposal is probably controversial).

If you're interested in pushing this forward, it might be worth creating a PR on Github, perhaps starting from my SVN branch, merging with trunk, and fixing the conflict.

@vicuna
Copy link
Author

vicuna commented Nov 26, 2015

Comment author: @Drup

PR here: #312

frisch: The rebase was acrobatic, and I basically changed most of your code in the process, so review would be welcome.

@vicuna
Copy link
Author

vicuna commented Dec 9, 2015

Comment author: @alainfrisch

#326 has been merged.

@vicuna vicuna closed this as completed Feb 16, 2017
@vicuna vicuna added this to the 4.03.0 milestone Mar 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants