[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Jake Donham <jake.donham@s...> |
| Subject: | ocamllex regexp problem |
Hi list,
I am trying to parse an RSS feed using OCaml-RSS, which uses XML-Light,
which however does not support CDATA blocks. So I added support in the
ocamllex-based lexer as follows:
let ends_sq = [^']']* ']'
let ends_sq_sq = ends_sq ([^']'] ends_sq)* ']'+
let ends_sq_sq_ang = ends_sq_sq ([^'>'] ends_sq_sq)* '>'
or expanded:
let ends_sq_sq_ang = (([^']']*']') ([^']'] ([^']']*']'))* ']'+) ([^'>']
(([^']']*']') ([^']'] ([^']']*']'))* ']'+))* '>'
rule token = parse
[...]
| "<![CDATA[" (ends_sq_sq_ang as data)
[...]
Here ends_sq_sq_ang is supposed to match strings ending in ]]> which may
contain ] and >. If I give it an input like "foo]]]>bar]]>" (note the extra
square bracket after foo), ocamllex matches the whole input instead of just
"foo]]]>" as I would expect. But Micmatch, when given the same regexp, does
the right thing. (The ']'+ bits are supposed to handle the "]]]>" case.)
I have probably done something stupid and am embarrassing myself by
advertising it to the list, but I did check it carefully. Any idea why this
doesn't work? Thanks,
Jake