Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Str.matched_group throws Not_find after a succesful Str.string_partial_match #4193

Closed
vicuna opened this issue Jan 7, 2007 · 2 comments
Closed

Comments

@vicuna
Copy link

vicuna commented Jan 7, 2007

Original bug ID: 4193
Reporter: jjhellst
Status: closed (set by @xavierleroy on 2007-01-28T11:05:34Z)
Resolution: won't fix
Priority: normal
Severity: minor
Version: 3.09.3
Category: otherlibs

Bug description

The below test case throws a "Not_found" exception when I try to access the match result after a succesful Str.string_partial_match.

To me this seems like an off by one mismatch, because the below example seems to throw Not_found only when the input string is completely matched by the regexp. I.e., if "FooBar" input doesn't work but "Foobar " works. Both match my example regexp.

Or am I misunderstanding the way string_partial_match is supposed to work?

FWIW I'm running Mac OS X x86.

Thanks,
Janne

8<
module P = Printf

let wikilink_re = Str.regexp "\([A-Z][a-z]+\([A-Z][a-z]+\)+\)"

let _ =
(* Change "FooBar" to "FooBar " (added space) and the matched_group
1 is found. )
let s = "FooBar" in
if Str.string_partial_match wikilink_re s 0 then
begin
P.printf "matched string: '%s'\n" s;
P.printf "matched group 0: '%s'\n" (Str.matched_group 0 s);
(
This throws Not_found exception, even though it seems that
the group 1 should've been succesfully matched? *)
P.printf "matched group 1: '%s'\n" (Str.matched_group 1 s)
end
else ()
8<

@vicuna
Copy link
Author

vicuna commented Jan 7, 2007

Comment author: jjhellst

There's a related bug (at least I think it is) that I assume is related to the above problem. See the source code here:

I don't think "Foo" should match at all, the intention is that "FooBar" (CamelCase wikilinks) should match but "Foo" shouldn't.

8<

module P = Printf

let wikilink_re = Str.regexp "\([A-Z][a-z]+\([A-Z][a-z]+\)+\)"

let _ =
(* "Foo" is found to match the above regexp (which I think shouldn't
definitely match) whereas "Foo " doesn't match the regexp
anymore. )
let s = "Foo" in
if Str.string_partial_match wikilink_re s 0 then
begin
P.printf "matched string: '%s'\n" s;
P.printf "matched group 0: '%s'\n" (Str.matched_group 0 s);
(
This throws Not_found exception, even though it seems that
the group 1 should've been succesfully matched? *)
P.printf "matched group 1: '%s'\n" (Str.matched_group 1 s)
end
else ()
8<

@vicuna
Copy link
Author

vicuna commented Jan 28, 2007

Comment author: @xavierleroy

Concerning your additional note: yes, "Foo" partially matches your regexp
since "Foo" is the prefix of a string that would match your regexp,
say, "FooB". Str.string_partial_match has unintuitive semantics and
is very rarely useful. I'm pretty sure you should use Str.string_match
(or the Str.search_* functions, or an ocamllex lexer) if what you want
to do is find occurrences of CamelCase identifiers in a string.
Matched groups will work just fine then.

Concerning your initial bug report: yes, this is a bizarre interaction
between partial matching and the way groups (...) are recorded.
Basically, the non-deterministic automaton hits end-of-text and
partially accepts the input string before having an opportunity to
record the end of the matched group number 1. There is no easy fix
and I believe it isn't worth investing considerable effort in this
issue given that Str.string_partial_match is very rarely useful to
begin with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant