[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: | 2002-08-07 (07:36) |
From: | Jerome Vouillon <vouillon@p...> |
Subject: | Re: [Caml-list] Regarding regular expressions |
> For a language where backreferences are available, I thought it would be > interesting to see how often people make use of this feature. So I > downloaded the CPAN archive and analyzed the perl scripts there. Here > is what I found: > > - I found 165228 perl scripts/modules in CPAN [1]. > - Of those, 68501 use regular expressions [2]. > - Of those, 32359 (or 47 percent) use backreferences [3]. > > So, nearly half of all perl scripts on CPAN that use regular expressions > make use of the backreference feature. IMO this argues strongly in > favor of supporting backreferences in C++. (Backreferences can only be > handled by a backtracking NFA engine, IIRC.) What he means by backreference is a way to refer to a submatch. For instance, with the regular expression "^([^ ]*) *([^ ]*)", the backreference "$1" will refer to the substring matched by the first parenthesed subexpression "([^ ]*)". As long as the references do not occur in the regular expression itself, they can be handled perfectly well with a DFA engine. So, the numbers above do not prove anything. > There are other features besides backreferences that can only be > provided by a backtracking NFA. These features include non-greedy > quantification, positive and negative look-ahead and look-behind > assertions, independent sub-expressions, conditional sub-expressions, > and backreferences within the pattern itself. I believe all these features but backreferences within a pattern can be provided by a DFA engine (though my RE library only support one of these features, non-greedy quantification, at the moment). So, the real questions are: - how often are backreferences used within a pattern? - when they are used, is it just for convenience, or would it be hard to rewrite the pattern without using backreferences? [Feel free to forward this mail back to the C++ commitee.] -- Jerome ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners