Browse thread
ocamllex and python-style indentation
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Andreas Rossberg <rossberg@m...> |
| Subject: | Re: [Caml-list] ocamllex and python-style indentation |
On Jun 12, 2009, at 10.20 h, Andrej Bauer wrote:
> I think I understand the general idea of inserting "virtual" tokens,
> but the details confuse me still. So starting with
>
>> if True:
>> x = 3
>> y = (2 +
>> 4 + 5)
>> else:
>> x = 5
>> if False:
>> x = 8
>> z = 2
>
> Martin suggests the following:
>
>> {
>> if True:
>> ;
>> {
>> x = 3
>> ;
>> y = (2 +
>> ;
>> {
>> 4 + 5)
>> }
>> }
>> ;
>> else:
>> ;
>> {
>> x = 5
>> ;
>> if False:
>> ;
>> {
>> x = 8
>> ;
>> z = 2
>> }
>> }
>> }
>
> I have two questions. Notice that the { ... } and ( ... ) need not be
> correctly nested (in the top half), so how are we going to deal with
> this? The second question is, why are there the separators after and
> just before "else:". I would expect separators inside { .... }, but
> not around "else".
It depends on how exactly you define your layout rules. The usual
approach is to tie start of layout-sensitive blocks to particular
keywords -- this is essentially what Python and Haskell do. In that
case, the binding to y is not affected. Haskell's rules for optional
layout would rewrite your original program as
>> if True:
>> {x = 3
>> ;y = (2 +
>> 4 + 5)
>> }else:
>> {x = 5
>> ;if False:
>> {x = 8
>> ;z = 2
>> }}
The basic rules are fairly simple:
1. Insert "{" (assume width 0) before the first token following a
layout keyword (usually ":" in Python). This opens a block.
2. As long as inside a block, insert ";" before each token that is on
the _same_ column as the current (i.e. innermost) "{".
3. A block ends as soon as you see a line whose first token is _left_
of the current "{". Insert "}" before that token.
Blocks can be nested, so you need to maintain a stack of starting
columns in the parser. Note that rule 3 may end several blocks at
once. EOF is treated as a token at column 0.
The way I implemented this is by wrapping the ocamllex-generated lexer
with a function that compares each token's column with the top of the
layout stack and inserts auxiliary tokens as necessary.
Haskell has another rule for inserting "}" if there would be a parse
error without it (this is to allow inline blocks). This rule is pretty
fudgy, and almost impossible to implement properly with a conventional
parser generator. IMO, the only sane way to reformulate this rule is
again to tie it to specific keywords, e.g. insert "}" before "else" if
missing. This can be implemented in the parser by making closing
braces optional in the right places.
- Andreas