Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] Str.string_match raising Invalid_argument "String.sub" in gc
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Brian Rogoff <bpr@b...>
Subject: Re: [Caml-list] Str.string_match raising Invalid_argument "String.sub" in gc
On Thu, 23 Aug 2001, Nicolas George wrote:
> Le mercredi 22 août 2001 à 13:31, Miles Egan a écrit :
> >>		    PCRE-library (Perl Compatible Regular Expressions):
> > I've asked this several times before, but I think it's worth asking again: is
> > there any chance of adding pcre to the stock distribution?  It's superior in
> > every way the the str module and much friendlier to python/perl refugees.
> 
> I second that too. And because PCRE is under LGPL (Str is based on GNU
> regex, which is under GPL), it could be in the standard library and not
> only in the distribution. 

Some other "pure OCaml" regexp engines were discussed here recently, including 
Claude Marche's and the one from Unison. Since the Unison code is under GPL 
and not LGPL, and I'm a (inverse) license ayatollah, I can only use the
LGPL'ed one. I've been playing with it and it's quite nice, though I think it
needs a few more bells and whistles to satisfy the Perlers. I don't know how 
it compares in performance against the Pcre C code. 

I agree that Str is suboptimal, but I think that there are also a few
other ways in which string handling could be improved, like 

(1) Very long strings (Sys.max_string_length = 16777211 on most
    machines). Please don't tell me that slurping a 100M file into a 
    string is probably not smart, I know that, but it's a restriction
    that annoys some (many?) programmers. 

(2) Wide character strings

(3) Functional strings (and functional arrays while we're at it :)

(4) Substrings

(1) and (3) could be fixed by adding a "ropes" library, or (1) alone could
be fixed by building strings over Bigarrays. (2) can also be fixed using 
Bigarrays, either building on top of them or just stealing the C code and 
specializing it. I ported the SML Basis library for substrings over to
OCaml, but I much prefer Hansen's subsequence reference approach (if
you've read Finkel's "Advanced Programming Language Design" you know what
I mean) and I've made a new module based on that which I'll release after
some more tire kicking; e-mail me if you want a version. Interestingly, it 
depends on physical reference equality so a semantics preserving port to
SML would require some uglification. 

So, I think we could use a richer set of string datatypes, and operations 
over them. It's not clear to me how much of this needs to be part of OCaml 
proper, and how much should just be, say, part of the CDK. It is clear that 
if there is going to be built-in regexp matching that Str is not the way to go. 
 
> Maybe we could even hope a regexp pattern matching as a syntax extension :-)

Some version of Haskell had a regexp matcher built in that worked on regexps over 
other types than characters. I don't think it survived, but it's certainly
a cool idea.

-- Brian


-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr