Version française
Home     About     Download     Resources     Contact us    
Browse thread
bug in "developing applications with objective caml" (english translation)
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Jack Andrews <effbiae@i...>
Subject: some comments on ocaml{lex,yacc} from a novice's POV
hi,

this is a little long.  i'm new to ocaml, but like most, have been
educated in FLs and experimented with and applied functional languages and
techniques.  python has been the first language i turn to for a few years
now.

i need to parse text as a sequence of records (with odd variations). i
have used ply (python lex-yacc) most recently for parsing and believe it
to be one of the more elegant mechanisms i've seen. 
http://systems.cs.uchicago.edu/ply/ply.html

elegant because there are no lex and yacc input files, but rather the
tokens and grammar rules are defined in python code -- succinctly!  eg:

# calclex.py
import lex
tokens = ( 'NUMBER', 'PLUS', 'MINUS', 'TIMES', 'DIVIDE', 'LPAREN', 'RPAREN',)
t_PLUS    = r'\+'  # in python, the r prefix to a string literal
t_MINUS   = r'-'   #  means as-is.  r'\' in python is "\\" in c
[snip]
def t_NUMBER(t):
    r'\d+'
    try: t.value = int(t.value)
    except ValueError:
         print "Line %d: Number %s is too large!" % (t.lineno,t.value)
         t.value = 0
    return t

by reflection/introspection ply finds all the token definitions in
calclex.py.  the only trick here is the first line of the t_NUMBER
function.  in python, any string literal as the first expression in a
function is the doc_string (accessible by t_NUMBER.__doc__ in this case)

#!/usr/local/bin/python
import yacc
from calclex import tokens  # this is where python builds the lexer

def p_expression_plus(p):
    'expression : expression PLUS term'
    p[0] = p[1] + p[3]
[snip]
def p_factor_expr(p):
    'factor : LPAREN expression RPAREN'
    p[0] = p[2]
# this is where python builds the parser
yacc.yacc()   # or yacc.yacc(method="LALR") for alternate parsing methods
while 1:
   try: s = raw_input('calc > ')
   except EOFError: break
   if not s: continue
   result = yacc.parse(s)
   print result

once again, using the names of functions and their docstrings, ply can
build a parser.

but i want to use ocaml, not python because i know i need (more) speed. 
after using ply, the ocaml{yacc,lex} implementation looks like it's just
glued on GNU tools.  not that there's anything wrong with that, but
integration with the language is nothing like that of ply.

don't get me wrong, i don't think ply is perfect, and i don't know enough
about parsing to be any kind of authority, but it seems to me a bit odd
that a comment in a caml parser is either (**) or /**/ depending on
context and in lexical analysis, a character set is expressed as ['A'-'Z'
'a'-'z' '_'] rather than usual (succinct) regexp syntax: [A-Za-z_]  (less
than half the characters)   really, the .mll and .mly look nothing like
caml

take what i say with a grain of salt, i'm no authority on anything i've said.


jack