Version française
Home     About     Download     Resources     Contact us    
Browse thread
Typesystem and Parsers
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Oliver Bandel <oliver@f...>
Subject: Typesystem and Parsers
Hello,

when reading papers or books on parsing
techniques, the parsing often is done in
different distinctive steps, where type checking
and semantic checks are done after the parse tree
is build up.

This may be the classical way, for example when doing it in C.

When using OCaml with it's strong type system,
IMHO the big advantage is, that the type system can be used to  
restrict the input data in a way that without additionally checks  
correct input is enforced, otherwise a parse error is detected.

Also with arranging a parser (e.g. with ocamlyacc) both ways can be  
walked along, either by just accepting everything and build up the  
tree, and later detect erros in syntax or type... (for example all  
scanned entities given back as strings or string lists)...

...or the parser can just accept only what the type system would accept,
which would be enforced by using sum types.

I think, both ways have their advantages, but the strongly typed  
approach seems not to be talked about in books and papers.

So somehow I'm looking for arguments and techniques on how to use  
Ocaml's type system efficiently, but maybe also take advantage of the  
flexibility of a weak type system (e.g. by exploring the tree qith the  
wrong syntax, to analyze it nevertheless).

Can you point me to some arguments on how you would do your parsing
and why and in which csituation you would chose the one or the other approach?


For making it more specific: the current program I'm sitting on, is a simple
interpreter that helps me in analyzing webpages - a DSL for doing that.
It already runs and has some nice features, but in the development stage
I toggled between using sum-types (changed forth and back between some  
solutuons) and using just simple basic types (string and string list).

I once read an interesting paper on which types to use...
... it was by... hmhhh forgot the name, but I think it was
one of the outstanding FP-programmers. The argument was: use
basic types as often as you can, and avoid specific types.

This approach has helped: string and string list can be easily used
in a parser, when uzsing recursive riiules: just prepend the new value  
to the already built list. Easy done.
But this throws out the type system's strength.

Giving the values sum types, which is more specific, would also have  
advantages.

One reason why I'm not really decided, which way to go also is, that  
at the moment it's an interpreter, but maybe lkater I want to make it  
createing in-between-code maybe even optimizing.

So, can you please elaborate on advantages and disadvantages,
when to use which way?

Ciao,
     Oliver

P.S.: Writing a compiler I also have in mind, but it has nothing to do  
with the
       web-parser-DSL, and is related to microcontroller programming... but if
       you could also elaborate on the compiler issues regarding usage of
       typesystem in parsers, this would be also very interesting too me.