Version française
Home     About     Download     Resources     Contact us    
Browse thread
ocamllex and python-style indentation
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Martin Jambon <martin.jambon@e...>
Subject: Re: [Caml-list] Re: ocamllex and python-style indentation
Sylvain Le Gall wrote:
> Hello,
> 
> On 01-07-2009, Andreas Rossberg <rossberg@mpi-sws.org> wrote:
>> Mike Lin wrote:
>>> OK, now I'm curious :) how does your lexer match balanced parentheses,
>>> or in this case comments?
>>>   
>> Easily, with a bit of side effects (I think that's roughly how all ML 
>> compilers do it):
>>
>> ------------------------------------------------
>> let error l s = (* ... *)
>> let commentDepth = ref 0
>> let start = ref 0
>> let loc length = let pos = !start in (pos, pos+length)
>>
>> rule lex =
>>     parse eof            { EOF }
>>     (* | ... *)
>>     | "{-"            { start := pos lexbuf;
>>                   lexNestComment lexbuf }
>>
>> and lexNestComment =
>>     parse eof            { error (loc 2) "unterminated comment" }
>>     | "(*"            { incr commentDepth;
>>                   lexNestComment lexbuf }
>>     | "*)"            { decr commentDepth;
>>                   if !commentDepth > 0
>>                   then lexNestComment lexbuf
>>                   else lex lexbuf }
>>     | _            { lexNestComment lexbuf }
>> ------------------------------------------------
>>
>> If you also want to treat strings in comments specially (like OCaml), 
>> then you need to do a bit more work, but it's basically the same idea.
>>
> 
> May I recommend you to write this in a more simple way:
> 
> -------------------------------------------------------------------------
> rule lex =
>   parse eof    { () }
>   | "(*"       { start := pos lexbuf; lexNestComment lexbuf; lex lexbuf }
> 
> and lexNestComment =
>   parse eof    { error (loc 2) "unterminated comment" }
> | "(*"         { lexNestComment lexbuf }
> | "*)"         { () }
> | _            { lexNestComment lexbuf }
> -------------------------------------------------------------------------
> 
> I think it works the same way, except that it uses less global
> variables.


You can even get rid of global variables completely:


rule lex x = parse
  eof    { () }
| "(*"   { x.start <- pos lexbuf; lexNestComment x lexbuf; lex x lexbuf }

and lexNestComment x = parse
  eof          { error (loc x 2) "unterminated comment" }
| "(*"         { lexNestComment x lexbuf }
| "*)"         { () }
| _            { lexNestComment x lexbuf }





Martin

-- 
http://mjambon.com/