Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open Extensible Types #5584

Closed
vicuna opened this issue Apr 11, 2012 · 40 comments
Closed

Open Extensible Types #5584

vicuna opened this issue Apr 11, 2012 · 40 comments
Assignees

Comments

@vicuna
Copy link

vicuna commented Apr 11, 2012

Original bug ID: 5584
Reporter: @lpw25
Assigned to: @garrigue
Status: closed (set by @xavierleroy on 2015-12-11T18:26:46Z)
Resolution: fixed
Priority: normal
Severity: feature
Fixed in version: 4.02.0+dev
Category: ~DO NOT USE (was: OCaml general)
Tags: patch
Related to: #6219
Monitored by: @gasche @lpw25 mehdi @diml @ygrek @yallop @hcarty @yakobowski @alainfrisch

Bug description

OCaml already has one open extensible type (exn), it would be useful to extend this and allow abstract types to be declared open.

This could be particularly useful when combined with GADTs.

Example syntax:

open type foo

extend foo with Constr of int

open type 'a bar

extend _ bar with Gadt1: int bar

extend char bar with Gadt2

I've attached a patch (tested with revision 12270) and more details can be found at:

https://sites.google.com/site/ocamlopen/

Note that this patch does not yet add these extensions to the camlp4 parser, and may break existing camlp4 code.

File attachments

@vicuna
Copy link
Author

vicuna commented Apr 12, 2012

Comment author: @alainfrisch

I like the proposal very much, but I'm not so fond of the proposed syntax.

  1. "open type foo" seems to indicate that one wants to open the type foo. For instance, if foo had been declared as a record type (e.g. in a different module, or with labels that have been shadowed by another declaration), this could mean that one wants to bring its labels in scope. Here, the intention is to declare a new datatype, so the syntax should start with "type foo" like for any other type declaration. For instance: "type foo = open" or "type foo = ..".

  2. "extend foo with Constr of int" introduces a new keyword, which is rather common and is thus likely to break existing code. From my experience, there is little chance this can be accepted in OCaml. Another proposal: "type foo += Constr of int". This also has the advantage that one can add several cases:

type foo += Foo of int | Bar of string

  1. "type foo = A | .." could be a short-hand for "type foo = .. ;; type foo += A".

@vicuna
Copy link
Author

vicuna commented Apr 13, 2012

Comment author: @lpw25

I agree that the proposed syntax is not very good. Regarding the above suggestions:

For type declaratons, I think that the "type 'a foo = .." syntax would be fine. Another alternative would be "type > 'a foo", borrowing the ">" from polymorphic variants.

Whichever syntax is used for type declarations, I think that it should definitely allow type abbreviations to be checked for their openness. For instance:

type foo = bar

should create a type abbreviation that is open if bar is open, but

type > foo = bar

should check that bar is open and create a type abbreviation.

2 and 3.
I like the "type foo +=" syntax, it is certainly preferable to a new keyword. The only downside is that it provides no obvious syntax for rebinding extensions, but I don't think that this is a particularly useful feature anyway.

I'm not sure about allowing multiple extensions to be declared at once, or allowing extensions to be declared within the type declaration, because:

type foo = A | B | ..

will be compatible with

type foo = B | A | ..

which could cause confusion.

I also think that if it is too easy to create an open type and extensions that are equivalent to an ordinary variant type, then people will create them unnecessarily. Essentially, the additional runtime cost of using extensions compared to ordinary variants should be reflected by the syntax in some way.

@vicuna
Copy link
Author

vicuna commented Apr 13, 2012

Comment author: @alainfrisch

  1. I strongly prefer "type 'a foo = .." to "type > 'a foo". We are declaring a type, which happens to be of a new kind (previously, we had only "abstract", "variant", "record"; now there is "open"). Now similarly to other kinds of type declarations with a manifest type, we can write both:

type bar = foo
type bar = foo = ..

  1. For rebinding, what about: "type foo += M.X = X of int * int"?

I agree with your remark about defining constructors directly within the type declaration. But I don't see the problem with multiple extensions:

type foo = ..
type foo += A of int | B of string

@vicuna
Copy link
Author

vicuna commented Apr 13, 2012

Comment author: @lpw25

For the rebinding it would be more like: "type foo += X = M.Y", which is probably fine.

I think that having all of

"type foo += A of int | B of string"

"type foo += B of string | A of int"

"type foo += A of int
...
type foo += B of string"

as equivalent definitions may cause confusion, because there aren't really any other features in OCaml that allow similar duplication.

Such syntactic sugar might be better added using camlp4.

@vicuna
Copy link
Author

vicuna commented Apr 13, 2012

Comment author: @garrigue

I have not yet tried the patch, but the idea seems fine.
Concerning the syntax, I would vote for

type open foo = ...
type foo += ...

Since open is already a keyword there is no problem.
By the way I believe that private should have been the same way, but wasn't for historical reasons.
Also, private was originally intended to apply to individual constructors, so it would make sense to use it with that meaning.
Last, I believe that you should allow

type foo += A of int | B of bool

Since the syntax follows the one for variants, there is no reason have people write more than necessary.

For me the advantage of extensible types is double:

  • allow extension of an already defined type (but this was already possible with exn)
  • allow GADT constructors (which is difficult with polymorphic variants)

The real goal of polymorphic variants was to allow reusing the same constructor in different types, with fine grain pattern-matching, and this is orthogonal to this.

@vicuna
Copy link
Author

vicuna commented May 18, 2012

Comment author: @lpw25

I have attached a new version of the patch (tested with revision 12464).

The new version uses the syntax:

type 'a foo = ..

to define open types. To match this the behaviour has been changed so that open types are a new kind of type, rather than a property of types.

The syntax for adding extensions has also been changed to:

type 'a foo +=
A of 'a
| B: int foo

To match this syntax, these have been changed to behave more like extensions to a type's definition, rather than declarations of constructors. Intuitively, they behave like:

type 'a foo_new = 'a foo + A of 'a | B: int foo_new

and can be used wherever the definition:

type 'a foo_new = 'a foo = ..

would be allowed.

In the above definition, although A and B are defined in a single extension, they are separate members of a module signature so that the following definitions are equivalent to the one above:

type 'a foo += A of 'a

type 'a foo += B: int foo

Extensions can be made private using the expected syntax:

type 'a foo += private C of int

They can also be rebound:

type 'a foo += D = A

The patch also changes the implementation to use Alain's suggestion of using an object tag instead of needing a new tag for extensions.

I have added the extension to Ocamldoc and the other tools, but I did not add the new syntax to either camlp4 or the emacs mode, as I thought that was better left to someone who knows more about them.

@vicuna
Copy link
Author

vicuna commented Jun 9, 2012

Comment author: @hcarty

Is this patch still being considered for inclusion? I tried applying patch 2.0.0 against the trunk revision 12594 and it does not apply cleanly.

@vicuna
Copy link
Author

vicuna commented Jun 9, 2012

Comment author: @lpw25

I don't know whether it's being considered for inclusion, but patch 2.0.0 was made for revision 12464 and should work with that one. I wasn't planning on rebasing the patch on a newer revision until OCaml version 4.0.0 is released.

@vicuna
Copy link
Author

vicuna commented Aug 10, 2012

Comment author: @hcarty

Will there be an update to this patch now that 4.00.0 is out?

@vicuna
Copy link
Author

vicuna commented Jan 9, 2013

Comment author: @lpw25

The latest version of this patch (currently about 6 months behind trunk) can be found at https://github.com/lpw25/ocaml-open/.

There is also a version built against 4.00.1 at https://github.com/lpw25/ocaml-open/archive/4.00.1+open_types.tar.gz

@vicuna
Copy link
Author

vicuna commented Feb 27, 2014

Comment author: @lpw25

I've finally brought my patch up to date. The attached patch (extension.patch-3.0.0) was tested with revision 14443. To make review easier I've also added a pull request to the Github mirror (#18).

@vicuna
Copy link
Author

vicuna commented Mar 19, 2014

Comment author: @garrigue

This is a large patch, but actually a big part of it is about updating ocamldoc to handle extensible types...
The approach of defining them as a variant of exception types makes most of the changes smooth, so from the point of view of typing I expect no big problem.
This raises the question of why not just replace exceptions with extensible types (they are a special case).

I've created a branch open_types in subversion, merging extension.patch-3.0.0.
Bootstrap went fine, and all the tests go through.
I'll still need to check in detail what happens in Typedecl.
In particular variance inference is easy to get wrong, and I'm slightly disturbed by the size
of the transl_extension_constructor function (couldn't duplication be avoided, will it be easy to
maintain?)
Out of that my only gripe is that some lines are more than 100 characters (coding rule is 80 characters per line)...

@vicuna
Copy link
Author

vicuna commented Mar 19, 2014

Comment author: @lpw25

In particular variance inference is easy to get wrong, and I'm slightly disturbed by the size
of the transl_extension_constructor function (couldn't duplication be avoided, will it be easy to
maintain?)

I've attached a patch (shorter-transl-extension-constructor.patch) which makes transl_extension_constructor shorter by factoring out the record creation in each branch of the match. It is still quite a long function, but most of that is the logic for rebinding extensions which can't really be avoided.

Out of that my only gripe is that some lines are more than 100 characters (coding rule is 80 characters per line)...

Damn, I thought I caught all of those (excluding ocamldoc whose code does not obey the 80 column rule at all).

@vicuna
Copy link
Author

vicuna commented Mar 25, 2014

Comment author: @garrigue

Sorry for the slow reaction.
I merged the patch. It makes the function a bit shorter, but this is mostly cosmetic.
I would have hoped more sharing with normal constructors, but maybe this is difficult.

Also, what about the reason to keep exceptions as an independent mechanism?
Differences in internal representations?
Well, it could certainly make bootstrapping difficult...

@vicuna
Copy link
Author

vicuna commented Mar 25, 2014

Comment author: @lpw25

I would have hoped more sharing with normal constructors, but maybe this is difficult.

Possibly make_cstr could be extracted from transl_declaration and made a bit more general. Then it could be used by both regular constructors and type extensions.

Also, what about the reason to keep exceptions as an independent mechanism?

There is a slight difference in representations of exception declarations and type extensions. The exception declarations contain a string with the name of the exception. This allows the name to be printed if the exception is not caught. Type extensions do not include such a string.

So currently exception Foo and type exn += Foo are not considered equivalent in a module type (and the second one is currently not allowed -- which ensures that all exn constructors can have their names printed).

Note that the constructors themselves are already represented the same (using Cstr_extension).

@vicuna
Copy link
Author

vicuna commented Mar 25, 2014

Comment author: @alainfrisch

Wouldn't it be useful to keep the constructor name, for debugging purposes? I don't know precisely how this would be exported to the user, maybe with a special primitive that can be instantiated only on extensible types. I.e. the user would write:

type 'a foo = ..

val name: 'a foo -> string = "%caml_constructor_name"

(The compiler would ensure that the primitive is only used at the proper types.)

@vicuna
Copy link
Author

vicuna commented Mar 26, 2014

Comment author: @lpw25

Wouldn't it be useful to keep the constructor name, for debugging purposes? I don't know precisely how this would be exported to the user, maybe with a special primitive that can be instantiated only on extensible types.

The simplest way would probably be via a function in Obj. A similar function could be provided to obtain the unique id of the extension/exception.

@vicuna
Copy link
Author

vicuna commented Mar 26, 2014

Comment author: @alainfrisch

The simplest way would probably be via a function in Obj.

Yes, but I assume its signature cannot be restricted to accept only values of extensible types, hence my proposal. Another approach would be to define a built-in supertype of all extensible types, and expose a single function working on that type:

val name: extensible -> string

...

name (x :> extensible)

@vicuna
Copy link
Author

vicuna commented Mar 26, 2014

Comment author: @lpw25

Yes, but I assume its signature cannot be restricted to accept only values of extensible types

As the use cases for this feature are just debugging, I'm not sure that it is worth the additional complexity. A run-time exception for misusing the function should be sufficient (and achievable since I don't think an object tagged block whose first parameter is a string appears elsewhere).

@vicuna
Copy link
Author

vicuna commented Apr 11, 2014

Comment author: @lpw25

I've uploaded another patch (to be applied on top of the last one). This patch does 4 things:

  • Extracts make_cstr from transl_declaration and makes it a bit more general. This allows it to be used for both regular constructors and type extensions.

  • Fixes a bug (forgot to check that extension arguments had closed types).

  • Makes exceptions a special case of type extensions. This is basically done by making the run-time representations identical and then replacing all Sig_exceptions with the equivalent Sig_typext.

  • Adds Obj.extension_name and Obj.extension_id which can be used to print the name/id of an extension (or exception).

These commits have also been added to the pull request, where they can be viewed as separate diffs.

@vicuna
Copy link
Author

vicuna commented Apr 11, 2014

Comment author: @lpw25

Note that a side-effect of the most recent patch is support for exceptions with GADT syntax:

exception Foo : 'a -> exn

which addresses #6219.

@vicuna
Copy link
Author

vicuna commented Apr 14, 2014

Comment author: @alainfrisch

Jacques: I think you forgot to add ocamldoc/odoc_extension.ml to the branch.

@vicuna
Copy link
Author

vicuna commented Apr 14, 2014

Comment author: @alainfrisch

I'm testing the branch, not the latest patches, so maybe this doesn't apply:

# exception A;;
exception A
# A;;
- : exn = A
# type foo = ..;;
type foo = ..
# type foo += A;;
type foo += A
# A;;
- : foo = 

It could be useful to have the toplevel uses the same printing heuristics for extensible types as for exceptions.

@vicuna
Copy link
Author

vicuna commented Apr 14, 2014

Comment author: @lpw25

I'm testing the branch, not the latest patches, so maybe this doesn't apply:
...
It could be useful to have the toplevel uses the same printing heuristics for extensible types as for exceptions.

That is indeed added by the latest patch, since it depends on storing the name in the "slot". It is slightly less aggressive than the exception printer (which kind of breaks abstraction when it cannot find the correct exception definition), but it will now print - : foo = A for your example.

@vicuna
Copy link
Author

vicuna commented Apr 14, 2014

Comment author: @alainfrisch

The patch represents type parameters with "core_type" instead of "string loc option" in the Parsetree. This was discussed in April 2013 on the wg-camlp4 mailing list ("Changes to the parsetree" thread). My position is still that it's a good change, but only if the parser accepts arbitrary core types in position of type parameters, and have the type-checker reject forms which are not allowed (the argument is that it reduces the "syntactic invariants" required for a parsetree to be printed and parsed backed). Moreover, it could certainly be useful to allow attributes on type parameters. As far as I remember, allowing arbitrary type expressions introduced some conflicts in the grammar, though.

How is this change related to the current proposal? If there is no real need for it, I'd propose to dissociate this point from the patch.

@vicuna
Copy link
Author

vicuna commented Apr 14, 2014

Comment author: @lpw25

How is this change related to the current proposal?

I found it useful to be able to obtain the Types.type_expr corresponding to a type parameter directly from a Typedtree.type_extension. This meant changing the parameter from a string loc to a core_type. For consistency I also changed it in Parsetree.type_extension. Then for even more consistency, I changed it everywhere else.

Note that, unlike a Typedtree.type_declaration, a Typedtree.type_extension does not have a corresponding Types.type_extension from which to extract this information.

I think that it is also a more sensible representation. The parameter is represented with (restricted) type expression syntax and has a corresponding type_expr in the Types.type_decl, so it is both syntactically and semantically a type.

Also note that, in the presence of constraints, the type_expr contains information that is not available in the string loc.

Moreover, it could certainly be useful to allow attributes on type parameters

That should certainly be easy after this change.

@vicuna
Copy link
Author

vicuna commented Apr 14, 2014

Comment author: @alainfrisch

Do you think it would be possible to tweak the grammar to accept all kinds of type expressions, then? It would be good that the type-checker checks for accepted forms anyway, to prevent weird behavior with badly buggy ppx rewriters for instance.

@vicuna
Copy link
Author

vicuna commented Apr 14, 2014

Comment author: @lpw25

Do you think it would be possible to tweak the grammar to accept all kinds of type expressions, then?

I'm not sure, I'm assuming that there is some kind of conflict. But it should certainly be possible to extend the grammar to allow things like attributes and extensions.

It would be good that the type-checker checks for accepted forms anyway, to prevent weird behavior with badly buggy ppx rewriters for instance.

This is a good point, although it raises some much wider issues: what invariants does the type checker assume about its input from the parser and are they all checked? What error should be raised for a case that is only possible with a faulty ppx rewriter? Should there be a sanity check that encodes all the invariants and can be run before type-checking?

@vicuna
Copy link
Author

vicuna commented Apr 15, 2014

Comment author: @garrigue

I've merged merge-exceptions in the open_types branch, and added forgotten files.
I should probably have been more careful about where to base the branch, and I'm getting some conflicts...

@vicuna
Copy link
Author

vicuna commented Apr 15, 2014

Comment author: @garrigue

I do not think it is a good idea to accept arbitrary type expressions as formal parameters in a declaration.
If I understand correctly, Leo's idea was just to use core_type rather than string loc in the parse tree to get a more uniform representation, without changing the parser itself.
However, as Alain points, this creates a problem with ppx rewriters, as they may create badly formed declarations.
Overall, I do not think this kind of relaxing of the parse tree is a good idea, as it creates a distance between the type declaration and allowed input.

The question of attributes is of course orthogonal to that.
However, allowing attributes on parameters in the type declaration does not seem very nice syntactically...

@vicuna
Copy link
Author

vicuna commented Apr 15, 2014

Comment author: @lpw25

If I understand correctly, Leo's idea was just to use core_type rather than string loc in the parse tree to get a more uniform representation, without changing the parser itself.

That's the idea.

However, as Alain points, this creates a problem with ppx rewriters, as they may create badly formed declarations.
Overall, I do not think this kind of relaxing of the parse tree is a good idea, as it creates a distance between the type declaration and allowed input.

Without searching properly for an actual example, I assume that there are probably a few places where particular parsetree does not correspond to a valid output of the parser. Do you think we should try and locate these assumptions and ensure that they are checked properly during type-checking? Perhaps we should have a "Illegal Parse Tree" error for such cases.

@vicuna
Copy link
Author

vicuna commented Apr 15, 2014

Comment author: @alainfrisch

I assume that there are probably a few places where particular parsetree does not correspond to a valid output of the parser

"Structural" invariants are now better expressed in the Parsetree. For instance, there are two different nodes for Pexp_fun and Pexp_function, while previously a kind of node could express forms that don't have a valid syntax.

There are still some constraints such as Ptyp_poly and Pexp_poly which can only appear in specific contexts (documented in the parsetree.mli) -- I don't remember if the type-checker behaves correctly when the constraint is not satisfied.

There are also a lot of side conditions, such as many lists being non empty, identifier being well-formed, etc.

Do you think we should try and locate these assumptions and ensure that they are checked properly during type-checking?

This seems a good goal, and a first step is to document missing constraints in parsetree.mli.

@vicuna
Copy link
Author

vicuna commented Apr 23, 2014

Comment author: @lpw25

Is there anything else I should be doing to help this get merged? I would really like to see it in 4.02.

This seems a good goal, and a first step is to document missing constraints in parsetree.mli.

I think I might work on a patch to give "illegal parse tree" errors instead of assertion failures in cases where parse tree invariants are not met, but I would prefer to keep that separate form this extensible variants patch.

@vicuna
Copy link
Author

vicuna commented Apr 24, 2014

Comment author: @garrigue

I think it's ready for merging, but I don't have much time to do it.
Can you first send a mail to caml-devel?

@vicuna
Copy link
Author

vicuna commented Apr 29, 2014

Comment author: @lpw25

Patch made up-to-date again (extension.patch-4.0.1). This is against current trunk (r14708).

@vicuna
Copy link
Author

vicuna commented May 4, 2014

Comment author: @garrigue

Merged extension.patch-4.0.2 in trunk at revision 14737.

@vicuna
Copy link
Author

vicuna commented May 5, 2014

Comment author: @gasche

Congratulations! That was a lot of work, and we now have a nice feature.

@vicuna
Copy link
Author

vicuna commented May 5, 2014

Comment author: @alainfrisch

Indeed, this is great!

Leo: can you propose a patch to the manual as well?

@vicuna
Copy link
Author

vicuna commented May 5, 2014

Comment author: @lpw25

Leo: can you propose a patch to the manual as well?

Sure. I'll have a look at that sometime in the next few weeks.

@vicuna
Copy link
Author

vicuna commented May 5, 2014

Comment author: ybarnoy

Awesome job! This is shaping up to be a terrific release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants