Version française
Home     About     Download     Resources     Contact us    
Browse thread
ocamllex and python-style indentation
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Martin Jambon <martin.jambon@e...>
Subject: Re: [Caml-list] ocamllex and python-style indentation
Andrej Bauer wrote:
> My parsing powers are not sufficient to easily come up with
> lexer/parser for a simple language that uses python-style indentation
> and newline rules. Does anyone have such a thing lying around, written
> in ocamllex/yacc or menhir? I would appreciate a peek to see how
> you've dealt with it.
> 
> For example, suppose we want just a very simple fragment of Python
> involving True, False, conditional statements, variables, and
> assignments, such as:
> 
> if True:
>     x = 3
>     y = (2 +
>       4 + 5)
> else:
>     x = 5
>     if False:
>         x = 8
>         z = 2
> 
> How would I go about writing a lexer/parser for such a thing in ocaml?

I would use a first pass that converts the input lines into this imaginary
structure:


{
if True:
;
    {
    x = 3
    ;
    y = (2 +
    ;
      {
      4 + 5)
      }
    }
;
else:
;
    {
    x = 5
    ;
    if False:
    ;
        {
        x = 8
        ;
        z = 2
        }
    }
}


You could create a generic tool that parses a file into this:

type t = Line of loc * string | Block of loc * t list


but as suggested by Yoann, the next step should probably be to flatten this
into a stream by introducing artificial tokens:

type gen_token =
   Open of loc          (* fake "{" *)
 | Close of loc         (* fake "}" *)
 | Separator of loc     (* fake ";" *)
 | Line of loc * string


then parse each Line into a list of tokens and flatten the result into one
single token stream:

type token =
   OPEN_BLOCK of loc  (* fake "{" *)
 | CLOSE_BLOCK of loc (* fake "}" *)
 | SEPARATOR of loc   (* fake ";" *)
 | ... (* your language-specific tokens here *)


The token stream could then be processed by ocamlyacc/menhir.


That's the approach I would follow if I had to solve this problem again.



Martin

-- 
http://mjambon.com/