Re: lexing strings

From: Pierre Weis (Pierre.Weis@inria.fr)
Date: Mon Jun 02 1997 - 17:35:43 MET DST


From: Pierre Weis <Pierre.Weis@inria.fr>
Message-Id: <199706021535.RAA22238@pauillac.inria.fr>
Subject: Re: lexing strings
In-Reply-To: <199706012353.SAA06495@kimbark.uchicago.edu> from Lyn A Headley at "Jun 1, 97 06:53:12 pm"
To: laheadle@midway.uchicago.edu (Lyn A Headley)
Date: Mon, 2 Jun 1997 17:35:43 +0200 (MET DST)

> [^'\n']*[^'\\']'\''
>
> which should match any sequence of non-newlines until it reaches a '
> not preceded by a backslash. slurp returns the token: STRING(!build)).
>
> My intent, when reading a string, is for the lexer to see the first ',
> jump into 'slurp,' eat up the string and return it as the STRING token,
> then have the parser read a newline and return EOL, thus matching the
> main grammar rule and printing the result. This almost works, but not
> until the user types _two_ newlines will the "interpreter" respond
> by printing the expression value! i.e., typing
>
> 'hi' [newline]
>
> at the prompt is not enough; two newlines are required. Other than
> that, the expected value is returned. Does this mean that the first
> newline is interpreted as part of the STRING? Why would my regex match
> the newline?

Yes, 'hi'\n' matches your regexp. I guess you want something along the
lines of

and slurp = parse
    "'"
    { STRING(rev !build) }
  | '\\' "'"
    { build := '\'' :: !build;
      slurp lexbuf }
  | eof
    { raise(Lexical_error "unterminated slurp") }
  | c
    { build := c :: !build;
      slurp lexbuf }

Hope this helps,

(Note: You should have defined the exception Lexical_error of string, in order
to signal the error "unterminated slurp".)

Pierre Weis

INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://pauillac.inria.fr/~weis/



This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:11 MET