[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Markus Mottl <markus@o...> |
| Subject: | Re: [Caml-list] Strange PCRE bug |
On Thu, 16 Sep 2004, Richard Jones wrote:
> # #load "pcre.cma";;
> # let rex = Pcre.regexp "(:?([a-z]+)\\s+)*";;
> val rex : Pcre.regexp = <abstr>
> # Pcre.extract_all ~rex "a b c d ee ff ";;
>
> (* Hangs, rapidly consuming memory. Killed with ^C ... *)
This is a bug concerning null patterns (i.e. ones that match empty
strings, too). I have fixed this now.
> On a more general point, how do I access all the strings captured by
> the inner brackets in a pattern like (:? (..) )* ?
The "(:?" should be "(?:".
Anyway, to answer your question: you can't. The capturing subpattern
"([a-z])+)" will always only capture the last in a series (as introduced
by "*" in your example).
I'm not sure what you want to do, but I guess you want to extract all
words containing characters from a-z in a string? In that case I'd
rather use the much simpler pattern "[a-z]+". "extract_all" will then
return an array of arrays of strings. Each array in the former denotes
an array of matched substrings. Unless you specify "~full_match:false"
the latter will contain the full match in position 0. The full match
is what we want here.
E.g.:
let () =
let rex = Pcre.regexp "[a-z]+" in
let subj = "this is 1 test" in
let many_sstrs = Pcre.extract_all ~rex subj in
let words = Array.map (fun sstrs -> sstrs.(0)) many_sstrs in
Array.iter print_endline words
This will print:
this
is
test
"extract_all" is the dual to "split". In contrast to the latter it
does not remove the matching patterns but keeps them (including matching
substrings), and ignores all else.
Regards,
Markus
--
Markus Mottl http://www.oefai.at/~markus markus@oefai.at