Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silent max number of 32 capturing groups in Str regexps #6989

Closed
vicuna opened this issue Sep 13, 2015 · 4 comments
Closed

Silent max number of 32 capturing groups in Str regexps #6989

vicuna opened this issue Sep 13, 2015 · 4 comments

Comments

@vicuna
Copy link

vicuna commented Sep 13, 2015

Original bug ID: 6989
Reporter: @mjambon
Status: closed (set by @xavierleroy on 2017-02-16T14:18:33Z)
Resolution: fixed
Priority: normal
Severity: minor
Version: 4.02.3
Target version: 4.03.0+dev / +beta1
Fixed in version: 4.03.0+dev / +beta1
Category: otherlibs
Monitored by: @gasche

Bug description

The current implementation of the str library uses a hardcoded limit of 32 capturing groups.

In practice this limit can be reached when regexps are generated, possibly aggravated by the lack of non-capturing groups (feature request #3969). Here we had a bug in the mikmatch_str reported by a user: mjambon/mikmatch#9

I suggest the following minimal changes:

Of course it would be ideal to support an unlimited number of groups, but I'm not sure it's worth the effort given that people with advanced needs will just use PCRE.

Steps to reproduce

File groups.ml:

#load "str.cma";;

let re =
  Str.regexp
    "\\(\\)\\(\\)\\(\\)\\(\\)\\(\\)\\(\\)\\(\\)\\(\\)\\(\\)\\(\\)\
     \\(\\)\\(\\)\\(\\)\\(\\)\\(\\)\\(\\)\\(\\)\\(\\)\\(\\)\\(\\)\
     \\(\\)\\(\\)\\(\\)\\(\\)\\(\\)\\(\\)\\(\\)\\(\\)\\(\\)\\(\\)\
     \\(x\\)\\(y\\)"

let s =
  let input = "xy" in
  if Str.string_match re input 0 then (
    if Str.matched_group 31 input = "x" then
      print_endline "x OK";
    if Str.matched_group 32 input = "y" then
      print_endline "y OK";
  )
  else
    assert false

This gives us:

$ ocaml groups.ml
x OK
Exception: Invalid_argument "Str.matched_group".

It should have printed:

x OK
y OK
@vicuna
Copy link
Author

vicuna commented Sep 13, 2015

Comment author: @xavierleroy

I agree the current behavior of (...) groups is unintuitive: the first N = 32 groups are binding, others are just non-binding parentheses.

It is trivial to fail at regexp compilation time if more than N groups are encountered. I'll do that soon if there is no objection.

It is also trivial to bump N to a higher value (say, 100) if this can help.

@vicuna
Copy link
Author

vicuna commented Sep 13, 2015

Comment author: @mjambon

Generated regexps can grow really fast in size, so increasing N from 32 to 100 may only benefit a few applications. If we wanted to go big, I'd suggest something like 10000, but it doesn't seem wise to statically allocate so much space.

@vicuna
Copy link
Author

vicuna commented Sep 13, 2015

Comment author: @xavierleroy

10000 groups, ouch!

We could play a trick that serves us well in other parts of the OCaml runtime system: use a statically-allocated array if the number of groups is small, and allocate the array dynamically otherwise. I'll look into this soon.

@vicuna
Copy link
Author

vicuna commented Dec 6, 2015

Comment author: @xavierleroy

Commit [trunk 45343f6] lifts the limitation to 32 binding groups. Now the sky is the limit.

@vicuna vicuna closed this as completed Feb 16, 2017
@vicuna vicuna added this to the 4.03.0 milestone Mar 14, 2019
@vicuna vicuna added the bug label Mar 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant