<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE message PUBLIC
  "-//MLarc//DTD MLarc output files//EN"
  "../../mlarc.dtd"[
  <!ATTLIST message
    listname CDATA #REQUIRED
    title CDATA #REQUIRED
  >
]>

  <?xml-stylesheet href="../../mlarc.xsl" type="text/xsl"?>


<message 
  url="2003/10/d831cd6b1f0e966eb6a36a5f34c782db"
  from="Christian Lindig &lt;lindig@c...&gt;"
  author="Christian Lindig"
  date="2003-10-06T08:07:35"
  subject="Re: [Caml-list] backslashes in ocamllex"
  prev="2003/10/925fbd349fd7a57457dbdae0cc2507bb"
  next="2003/10/fb66bbc6c00437c91178e4754d5acdb7"
  prev-in-thread="2003/10/925fbd349fd7a57457dbdae0cc2507bb"
  next-in-thread="2003/10/fe01f456e9804b0aadcf1664b4eb356b"
  prev-thread="2003/10/674fd596aeeccf052ea3d6535738d4a1"
  next-thread="2003/10/ce7c6be72bb67c47c5a7a60474df99bd"
  root="../../"
  period="month"
  listname="caml-list"
  title="Archives of the Caml mailing list">

<thread subject="[Caml-list] backslashes in ocamllex">
<msg 
  url="2003/10/925fbd349fd7a57457dbdae0cc2507bb"
  from="Rafael &apos;Dido&apos; Sevilla &lt;dido@i...&gt;"
  author="Rafael &apos;Dido&apos; Sevilla"
  date="2003-10-06T06:32:04"
  subject="[Caml-list] backslashes in ocamllex">
<msg 
  url="2003/10/d831cd6b1f0e966eb6a36a5f34c782db"
  from="Christian Lindig &lt;lindig@c...&gt;"
  author="Christian Lindig"
  date="2003-10-06T08:07:35"
  subject="Re: [Caml-list] backslashes in ocamllex">
<msg 
  url="2003/10/fe01f456e9804b0aadcf1664b4eb356b"
  from="Alain.Frisch@e..."
  author="Alain.Frisch@e..."
  date="2003-10-06T08:26:42"
  subject="Re: [Caml-list] backslashes in ocamllex">
</msg>
</msg>
<msg 
  url="2003/10/fb66bbc6c00437c91178e4754d5acdb7"
  from="Jean-Christophe Filliatre &lt;Jean-Christophe.Filliatre@l...&gt;"
  author="Jean-Christophe Filliatre"
  date="2003-10-06T08:10:01"
  subject="Re: [Caml-list] backslashes in ocamllex">
</msg>
</msg>
</thread>

<contents>
On Mon, Oct 06, 2003 at 09:37:40AM +0800, Rafael 'Dido' Sevilla wrote:
&gt; Now I'm stuck again.  I'm revising the lexical analyzer for my compiler
&gt; to enable it to recognize escaped strings, with conventions different
&gt; from OCaml's.  Currently, I'm using this regex:
&gt; 
&gt; '\'' ("\\\\"|"\\'"|[^'\''])* '\''
&gt; 
&gt; in an attempt to recognize strings that begin and end with single
&gt; quotes, but may possibly include sequences like \' that represent
&gt; escaped quotes, and '\\' that represent escaped backslashes.  -- 

As you discovered, you cannot recognize strings with a single regular
expression. You need a sub-lexer:

{
let get         = Lexing.lexeme
let getchar     = Lexing.lexeme_char
}

rule token = parse (* main lexer *)
    eof -&gt;
  | ...
  | "'" -&gt; string lexbuf (Buffer.create 80) (* use sub-lexer *)


and string = parse (* lexer for strings *)
    eof     -&gt; { fun buf -&gt; error "EOF in string" } 
  | '\\' _  -&gt; { fun buf -&gt; let c = getchar lexbuf 1 in
                            let k = match c with
                            | 'n'   -&gt; '\n'
                            | 't'   -&gt; '\t'
                            | .... 
                            in
                                ( Buffer.add_char buf k
                                ; string lexbuf buf
                                )
               }                 
  | _       -&gt; { fun buf -&gt; string lexbuf (Buffer.add_string (get lexbuf)
  | "'"     -&gt; { fun buf -&gt; Buffer.contents buf } (* return string *)

-- Christian

--
Christian Lindig         http://www.st.cs.uni-sb.de/~lindig/

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

</contents>

</message>

