Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] Str.string_match raising Invalid_argument "String.sub" in gc
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Frank Atanassow <franka@c...>
Subject: Re: [Caml-list] Str.string_match raising Invalid_argument "String.sub" in gc
Neale Pickett wrote (on 22-08-01 13:41 -0700):
> Alain Frisch writes:
> > On 22 Aug 2001 neale-caml@woozle.org wrote:
> 
> >> # let rec f l =
> >> let sep = Str.regexp "^[ \t\n]*\\(.+\\)" in
> >> match l with
> >> | [] -> []
> >> | [""] -> []
> >> | s :: rest -> if (Str.string_match sep s 0) then
> >> let foo = print_string ("match " ^ Str.matched_group 1 s ^ "\n") in
> >>             (Str.matched_group 1 s) :: (f rest)
> >                                     ^^
>                                     
> > This is wrong; with the current OCaml implementation, the right
> > operand of (::) is called first; so (Str.matched_group 1 s) is called
> > after subsequent calls to Str.string_match, which is obviously
> > incorrect.
> 
> Aha!  Thank you.
> 
> This makes sense, but it is certainly not obvious, especially in a
> language which purports to have no side-effects.

Ocaml does not purport to have no side-effects. It has plenty of side-effects.
You must be thinking of Haskell or Miranda.

> I can't help thinking
> that s should be a different string for every invocation, but clearly it
> is somehow related to the initial input string.  No doubt this is a
> clever optimization within OCaml which makes for drastically reduced
> memory usage when processing strings, but it does make things a bit
> confusing to the beginner.

I'm pretty sure there is no such optimization, but I'm not sure what you're
talking about here. Anyway, if an optimization affected the behavior of a
program, it would not be an optimization but rather an compiler bug.

> I don't have any good suggestions on how else to do it, although my base
> desire is to have a regexp matching function which returns a string list
> of the matched groups.

There is no need to mutate the list/string(s).

If I understand you correctly (but I don't think I do):

  let sep_list =
    let sep = Str.regexp "[ \t\n]+\\([^ \t\n]*\\)" in
    fun s ->
      let rec loop i =
        if Str.string_match sep s i then
           let m = Str.matched_group 1 s in
	   m :: loop (Str.match_end ())
	else
	   []
      in loop 0

# sep_list "  abc def  ghi j";;
- : string list = ["abc"; "def"; "ghi"; "j"]

But this is what the Str.split procedure does already:

# Str.split (Str.regexp "[ \t\n]+") "  abc def  ghi j";;
- : string list = ["abc"; "def"; "ghi"; "j"]

Your function has type string list -> string list, and it seems like it just
does the same match on every element of the list, so it's much easier:

  let map_match =
    let sep = Str.regexp "[ \t\n]*\\(.+\\)" in
    fun l ->
      let f s = Str.string_match sep s 0; Str.matched_group 1 s in
      List.map f l

# map_match ["  arf"; " barf"];;
- : string list = ["arf"; "barf"]

-- 
Frank Atanassow, Information & Computing Sciences, Utrecht University
Padualaan 14, PO Box 80.089, 3508 TB Utrecht, Netherlands
Tel +31 (030) 253-3261 Fax +31 (030) 251-379
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr