Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] Strange PCRE bug
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Markus Mottl <markus@o...>
Subject: Re: [Caml-list] Strange PCRE bug
On Thu, 16 Sep 2004, Richard Jones wrote:
> # #load "pcre.cma";;
> # let rex = Pcre.regexp "(:?([a-z]+)\\s+)*";;
> val rex : Pcre.regexp = <abstr>
> # Pcre.extract_all ~rex "a b c d ee ff ";;
> 
>   (* Hangs, rapidly consuming memory.  Killed with ^C ... *)

This is a bug concerning null patterns (i.e. ones that match empty
strings, too).  I have fixed this now.

> On a more general point, how do I access all the strings captured by
> the inner brackets in a pattern like (:? (..)  )*  ?

The "(:?" should be "(?:".

Anyway, to answer your question: you can't.  The capturing subpattern
"([a-z])+)" will always only capture the last in a series (as introduced
by "*" in your example).

I'm not sure what you want to do, but I guess you want to extract all
words containing characters from a-z in a string?  In that case I'd
rather use the much simpler pattern "[a-z]+".  "extract_all" will then
return an array of arrays of strings.  Each array in the former denotes
an array of matched substrings.  Unless you specify "~full_match:false"
the latter will contain the full match in position 0.  The full match
is what we want here.

E.g.:

  let () =
    let rex = Pcre.regexp "[a-z]+" in
    let subj = "this is 1 test" in
    let many_sstrs = Pcre.extract_all ~rex subj in
    let words = Array.map (fun sstrs -> sstrs.(0)) many_sstrs in
    Array.iter print_endline words

This will print:

  this
  is
  test

"extract_all" is the dual to "split".  In contrast to the latter it
does not remove the matching patterns but keeps them (including matching
substrings), and ignores all else.

Regards,
Markus

-- 
Markus Mottl          http://www.oefai.at/~markus          markus@oefai.at