New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for custom #... symbols (processed by ppx rewriters) #6583
Comments
Comment author: @Drup Thanks a lot for this patch. Indeed this would be very useful for js_of_ocaml's ppx. The operator-based solutions proposed before exhibit several issues, notably they don't have the right priorities, they play badly with indentation engines and they are awfully implicit.
Indeed, I think this is a reasonable solution, especially since "obj#m <- e" is not valid OCaml code. "<-" and ":=" have the same priorities. w.r.t to the target. I consider it quite important to have this in a small version soon. Otherwise it would give us a sub-optimal ppx for js_of_ocaml until the next big version of OCaml, which would prevent adoption. |
Comment author: @whitequark I just noticed that this patch will break -dsource. Extension nodes with names like "##" can't be re-parsed back. |
Comment author: @damiendoligez
This is a bit ad hoc: I see no reason to reserve this syntax for object-oriented stuff, so ideally the payload should be just the pair (obj, m). Is it possible to do something like that?
How hard is it to fix? Anyway, -dsource is an undocumented option that we only use for debugging the compiler, so it doesn't matter too much if we break it right? (though I'm not saying we should deliberately leave it broken now that we've noticed the problem) |
Comment author: @Drup
If I understand the remark correctly, it's technically already here, since the operation used is always the expression immediately inside the extension. If this custom symbol technique is extended to other constructs, for example "let!" presented in the second patch, the extension name will be the custom version used, and the first node in the extension will be Pexp_let. I'm not sure if the "normal" part of the operator/construct should be kept (extension #** or ** for the operator obj#**meth).
-dsource and -dparsetree are essential to any ppx programming, so it's not only used by compiler folks anymore! |
Comment author: @lpw25 Not relavent to the main point of this discussion, but as has been pointed out before, any "monadic let" should use something like "let*" because "let!" implies a relationship to "method!" and "open!". |
Comment author: @gasche
I think it should be #**, because it makes it clearer, for example to the reader of the extension code, that the thing being matched upon is a "funny method call" and not something else. |
Comment author: @alainfrisch Instead of supporting specific cases in Pprintast, one could:
I'd rather go for 1. |
Comment author: @damiendoligez
Me too, but once again, is it possible to have the argument be (o,m) rather than o#m ? |
Comment author: @Drup If we add new construct supporting this custom suffix, should it be still (o,m) ? What about those with 3 arguments (like let) ? I don't see any upside of using (o,m) but I see some downside if we add other syntaxes later on. I don't really understand why you want it. |
Comment author: @whitequark Personally I think the most elegant way is to use drup's method. For Pprintast, I offer encoding non-alphanumeric extension node names as OCaml string literals, i.e.:
becomes:
An alternative solution that does not require any Pprintast modification is as follows:
However, it is significantly less convenient for ppx extension authors. |
Comment author: @Drup The more I think about it, the more I think we should drop the leading "#" and any leading prefix. There are two reasons:
|
Comment author: @lpw25
I don't think we should be supporting arbitrary suffixes on Also, the more syntax we parse for ppx extensions the less syntax
Since the payloads of extensions are structure items rather than
It looks a bit heavy-weight when written out like that, but in |
Comment author: @ygrek please don't make ppx do arbitrary syntax modifications.. we have camlp4 for that already |
Comment author: @whitequark No one talks about arbitrary modifications, the only thing discussed is variants of existing keywords and syntax: let -> let* or let# or ..., obj#meth -> obj##meth or obj #* meth or ... Having to wrap every obj#meth as [%js obj#meth] for js_of_ocaml is obtuse. It's not really an option if you want the code to stay readable. |
Comment author: @alainfrisch Ok, let's drop the '#' or 'let' prefix from the extension node identifier. This removes the redundancy (you're welcome, Damien!) and make it clear that the payload should be (o # m). I don't think it makes sense to use a tuple to encode sub-parts of the "customized" expression in the payload. What is we want in the future to provide a customer version of (e : t)? Here 't' is a type expression, so encoding the payload as (e, t) won't work. And even for 'let', simply consider "let** rec p1 = e1 and ... and pn = en in e"... So my proposal is:
I've attached a patch that implements that (for both method calls and local let bindings). |
Comment author: @lpw25
I'll phrase my objection to the local let bindings thing another way: why do |
Comment author: @alainfrisch Good point. Let's keep "let!*#" out of the proposal (but it's a good illustration of why it's a bad idea to encode the sub-parts in a tuple). -> custom_forms_meth3.diff This one also adapts pprintast to use the new syntax for for attribute/extension identifier when needed (i.e. when the identifier is not a dot-separated sequence of uident/lident/keyword). |
Comment author: @damiendoligez lpw25> A key benefit of ppx is that it makes clear when code Indeed, so why don't we use the same kind of syntax as for let, namely: obj #%foo m I don't expect to see a lot of such extensions, so using one or two letters, this This would solve three problems at once:
|
Comment author: @alainfrisch I'll let js_of_ocaml guys comment on Damien's suggestion. The "#%" idea is very much in line with the existing light notation for extensions on expressions starting with a keyword. The "normal" expression is considered to be the payload of that extension. For attributes, it's less clearly useful, since the compact syntax doesn't allow to specify a payload for the attribute (the normal expression is the expression on which the attribute is attached). |
Comment author: @Drup "obj#%foo m" is barely better than "[%foo obj#m]" (only one character less, and that's because I don't add the space before "#"). Arguably, there are no delimiters, so it's easier to visually parse (I don't really agree). The goal is to have something that is terse enough to be used (very) commonly and that is not syntactically too disruptive. #%foo doesn't achieve that. It's not a bad solution, and as alain said, it's consistent with the rest of ppx, but I would dislike to have even more syntactic load in my program (which is already a bit of an issue with ppx). |
Comment author: @alainfrisch
I don't think that backward compatibilty is the issue: the switch to ppx for js_of_ocaml will require other changes anyway. The question seems rather to find something light enough syntactically. |
Comment author: @lpw25
What I meant was that we can't just appropriate all operators which start with a |
Comment author: @damiendoligez Quite frankly, I don't think this proposal is mature enough to include in 4.02.1. It's a rather big syntax change, and that doesn't belong in a bug-fix release. Syntax additions should not be rushed (lest we paint ourselves in a corner) and need to be discussed thoroughly on the developers list. As the 4.02.1 release is getting very close, we don't have enough time to make a good job of it. |
Comment author: @Drup I agree this is not really a bugfix, but as I explained in my first post, I'm quite worried of waiting for a year or so, for the next ocaml version, before having the syntax for js_of_ocaml's ppx. |
Comment author: @lpw25 I don't know whether the patch already does this, but the associativity and precedence of the transformation should probably be based on the first symbol after the "#" symbol. So 1 #+ 2 #+ 3 => [%(#+) [%(#+) 1 + 2] + 3] whilst 1 #@ 2 #@ 3 => [%(#@) 1 @ [%(#@) 2 @ 3]] |
Comment author: @Drup It does not, as far as I can see, but it was done in my infix constructor patch (with ":" as starting char instead of "#"). It can be copied directly. That being said, half the purpose of using # is the ability to use its precedence (which is higher than function application). |
Comment author: @alainfrisch
I thought it was actually the only point of the request to have "custom method application" constructions, with exactly the same syntactic precedence as the regular one. (And, as for the regular one, the custom method application should not be binary operators, since their rhs is a method name, not an expression.) |
Comment author: @lpw25
Sure, and
The same rule can be applied to the right-hand side of the operator. It should be whatever would go on the right-hand side of operators that start with the symbol after the This scheme also allows for unary operators: #?, #! and #~.
I just think that if we are going to add yet another extension syntax it should try to be more general than "the thing that js-of-ocaml's camlp4 extension already provides". A general set of operators that are reserved for use in extensions seems a reasonable option. |
Comment author: @alainfrisch
If we wanted to do so, we should use '%', not '#' (but the problem is that '%' is already reserved). The current rule is that the precedence of operators is governed by their first character, and it would only add confusion to say that it's actually the second one when the first is '#'.
I agree that we should resist the temptation to do something overly specific, and there are several existing solutions for js-of-ocaml (see ocsigen/js_of_ocaml#144), but none of them was deemed satisfactory. Personally, I'm still not convinced that the original proposal (using simply '#', interpreted differently in the scope of a [%js ....] block) wouldn't work. The argument against it is that it makes it harder to use both OCaml and JS objects in the same piece of code (it'd still be possible, provided you have a way to scape from [%js]). How often does this happen in practice? I suspect this is rare enough and that some extra burden in these cases is ok. |
Comment author: @lpw25
Good point. We could instead add We don't even need to translate them into some kind of |
Comment author: @Drup
I agree strongly with that. I like the "fit # into an operator anywhere and consider them special" proposition. I agree % would be better, but it's not possible. |
Comment author: @alainfrisch What about "t *# s" which is a valid type expression? |
Comment author: @Drup Can we crawl opam to check if there is any code using this without a whitespace in the middle ? |
Comment author: @lpw25
That would indeed be a grammar conflict. It could probably be worked around within the lexer and parser, but it is a bit of a pain. Perhaps there is no need to provide general support for operators as extensions, since a ppx can always just hi-jack a regular operator. In which case I guess the issue really is just about providing some operator symbols with the same precedence as So, how about we just add This would mean foo##(1 + 2) was syntactically valid, but the js_of_ocaml ppx could just give an error for such expressions. |
Comment author: @alainfrisch
I've attached a patch that does that, adding a new kind of token for the regexp: '#' (symbolchar | '#') + and recognizing it as a binary operator (with "simple_expr" on both sides). This means that the js_of_ocaml ppx would need to hijack an existing construction, and there are strong opponents to that. gasche and others: what's your opinion? |
Comment author: @gasche I'm still strongly convinced that diverting meaningful syntax in rewriter-specific ways is a bad idea. Forbidding those #-operators at type-checking would prevent this. There is an inherent tension about this in ppx that wasn't present for Camlp4. As the payload has to be a valid OCaml AST, the temptation to reuse (or extend) meaningful code in syntactically-convenient ways is strong. Maybe we should discuss to which extent that is reasonable (Maybe this is reasonable if scoped under an explicit [%foo ...]node? My personal hunch is that it is not). I think that making the AST more flexible while keeping those extra forms without meaning -- by rejecting them at type-checking time -- is the best way forward. |
Comment author: @Drup I would tend to agree with you gasche, except on one point : having an operator with priority higher than function application is clearly something lacking in current OCaml, and that could be an occasion to fix it. For forms which have no meaning whatsoever, I completely agree with you. |
Comment author: @lpw25
I think that in general this is true. However, there are cases where I think it is fine. In particular, cases where the whole file can reasonably be considered to be inside of a giant It seems to me that So whilst I would not recommend hijacking operators more generally (except within a [%foo ...] block), I think it is a reasonable behaviour for the js_of_ocaml extension. |
Comment author: @lpw25 (It seems if your connection cuts out then Mantis will post half a comment, I've edited the previous comment to fill in the rest). |
Comment author: @alainfrisch I sympathize with Leo's point of view that js_of_ocaml is actually a dialect of OCaml, and it's not overly shocking to have it redefine some existing concepts. (One could also argue that #*** operators are useful for OCaml, and leave the responsibility to js_of_ocaml developers to do the Bad Thing or not; but I'm not sure gasche would like this kind of rhetorical argument :-)) The only concrete risk I see is that some normal OCaml library could expose a ( ## ) binary operator, and it becomes difficult to use it from a piece of code processed with js_of_ocaml's ppx syntax. This can be mitigated by letting the user choose explicitly which #*** operator would be used by js_of_ocaml in a specific compilation unit. For instance, the ppx could detect: [%%js.configure.op "#@"] and then react on "#@" instead of the default "##". Or one could even go as far as not providing any default (and only support a more verbose syntax based on normal extension node by default), so that the user must explicitly decide (in each compilation unit) to use the "##" operator (or another one). If things are explicitly under the control of users, is that so bad to let ppx mappers hi-jack existing syntax for syntactic convenience? |
Comment author: @gasche (Alain privately asked for my opinion on the latter arguments.) I'm still convinced that overriding valid syntax is a mistake, but you don't need everyone to like a change to decide to apply it (consensus does not mean unanimity, I don't believe I have "veto powers", and if I had I wouldn't use them here because I decided months ago that other people could take care of camlp4/ppx and make their own decisions). I don't think anyone has argued in a convincing (to me) way that we need infix operators that bind stronger than application -- we certainly don't need them for js_of_ocaml which is the topic of this PR. I don't get the "it's a dialect anyway" argument -- every ppx is a dialect, and in my opinion no ppx should globally override the meaning of meaningful programs. For what it is worth (again, if you all agree on something, you can do it), here is my order of preference, from the thing I prefer to the thing I like the least:
|
Comment author: @alainfrisch
Drup: can you give some examples where such operators would be useful? |
Comment author: @Drup I'm not going to fight over it. I remember solving the lack of it by adding @@ everywhere, which is rather ugly and hurt readability, but work. |
Comment author: @lpw25
I would have thought anything "lens-like" would benefit from such operators, allowing:
instead of:
|
Comment author: @alainfrisch Ok, let's try the following:
(variant: instead of rejecting at type-checking time, reject in the "operator" rule of the parser, which imply that such operators cannot be defined in user-code.) Rationale: characters allowed in "trail" of operators don't currently include '#' (and there is no demand to allow e.g. +#). Opinions? |
Comment author: @Drup Seems good to me. I also vote for rejection at typechecking time. |
Comment author: @damiendoligez This latest version looks good to me. Rejecting at type-checking is better, because it lets ppx rewriters extend the language in a way that lets the user define such operators. |
Comment author: @alainfrisch Yes, the difference with gasche's third choice is that the criterion is not on the second symbol (the one following the initial '#') but on all symbols (after the first one). This is now committed to trunk (15892) and 4.02 (15893). Thanks to everyone for the exciting syntactic discussion :-) Now looking forward for ppx support in js_of_ocaml! |
Comment author: @Drup Thanks alain, working on it ! ;) |
Comment author: @Drup I think SHARPOP should be right-associative to allow chaining foo##bar##baz. It's consistent with #. the SHARP token doesn't need associativity declaration because it's not symmetric, left associativity is not valid. Here is a patch: diff --git a/parsing/parser.mly b/parsing/parser.mly |
Comment author: @gasche If you want foo##bar##baz to have a field-lookup-like or method-call-like semantics (like "#", and that sounds normal for consistency), then it should be understood as (foo##bar)##baz, and this is left-associative. When would right-associativity be the right choice? |
Comment author: @Drup When I'm not paying attention, mostly. You are absolutely right. |
Comment author: @alainfrisch Commits 15952 on 4.02 and 15953 on trunk made #-operators left-associative. |
…#' in trailing symbols, although the operator is then rejected by the type-checker). (Cherry-picked from trunk, rev 15892.) git-svn-id: http://caml.inria.fr/svn/ocaml/version/4.02@15893 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02
git-svn-id: http://caml.inria.fr/svn/ocaml/version/4.02@15952 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02
Original bug ID: 6583
Reporter: @alainfrisch
Assigned to: @alainfrisch
Status: closed (set by @xavierleroy on 2016-12-07T10:49:03Z)
Resolution: fixed
Priority: normal
Severity: minor
Target version: 4.02.2+dev / +rc1
Fixed in version: 4.02.2+dev / +rc1
Category: ~DO NOT USE (was: OCaml general)
Monitored by: @Drup @gasche @hcarty
Bug description
People working on a ppx-based support for js_of_ocaml would like to be able to use syntax such as:
obj ## m
The attached patch (sharp_op.diff) allows this kind of customized "method invocation" expressions, where the # symbol is replaced by # followed by a non-empty sequence of operator symbol or '#'. This is encoded in the Parsetree as a normal extension mode, using the sequence of operator symbols (including the leading '#') as the identifier and (obj # m) as the payload. The same approach could be used later to provide more syntactic hooks for ppx, e.g. "let! x = e1 in e2" as illustrated in sharp_let_op.diff.
Without any ppx rewriting, the expression above results in the following error message:
Uninterpreted extension '##'.
I'm setting Target Version = 4.02.1+dev since the people asking for it would really like to have it. Note that they also request:
obj ## m <- e
but maybe they could use:
obj ## m := e
File attachments
The text was updated successfully, but these errors were encountered: