Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007760OCamltools (ocaml{lex,yacc,dep,debug,...})public2018-04-02 19:492018-04-10 11:05
Assigned Tomaranget 
PlatformOSOS Version
Product Version4.06.0 
Target VersionFixed in Version4.07.0+dev/beta2/rc1/rc2 
Summary0007760: Segfault in ocamllex-generated code using 'shortest'
DescriptionOn my machine (amd64 Debian), the following program usually segfaults:

    rule read = shortest
      | ("aa" | "bbb") (_ as x) _? { x }
      | _ as y { y }

    { let _ = read (Lexing.from_string "asdf") }

when compiled and run as:

    ocamllex -q -o lexer.mll && ocamlopt -o lexer && ./lexer

This example is reduced from a larger lexer. The segfault only seems to occur when using 'shortest' instead of 'parse', but I'm not sure exactly which combination of features triggers the bug. The problem is reproducible using OCaml versions back to at least 3.11.2.
TagsNo tags attached.
Attached Files

- Relationships

-  Notes
frisch (developer)
2018-04-03 10:09

FWIW, using "ocamllex -ml" seems to work (at least, no segfault).
xclerc (reporter)
2018-04-03 11:02

The latest trunk (db6891f2) does not segfault on this code.
nojebar (developer)
2018-04-03 11:09
edited on: 2018-04-03 14:04

db6891f2 segfaults for me, looks like some bad indexing on the Lexing.lex_buffer field.

def (developer)
2018-04-06 09:25

I started investigating this issue.

The problem triggers when one branch capture sub-values (the `(_ as x)` in Stephen's example), while another branch catch all cases (the `_ as y`; note that this is not a sub-capture, the whole lexeme is returned, this doesn't stress the same code path).

The automaton produced is correct (though not minimal :)), that's why the `-ml` output works. However the bytecode generated is wrong: there will be out of bounds write to the lexbuf.Lexing.lex_mem fields (it is a vector that is used to store the locations of capture groups).

If you don't capture sub-values, the lexer will use the `caml_lex_engine` primitive for interpretation which is correct as far as I can tell.

However, if one of the branch capture sub-values, `caml_new_lex_engine` is used, which can do arbitrary writes (via the `run_tag` C function).

Btw, this is not an initialization issue (one could think that the position vector is too short), it is because of the wrong interpretation of a tag which consumes garbage values and writes at some arbitrary offset of lex_mem.

My next step will be to instrument bytecode generation to understand what goes wrong, but I progress slowly as I found few resources on that part :).
def (developer)
2018-04-06 09:28

xclerc: sometimes the random write corrupts the heap, sometimes it doesn't. You will have to test in different memory conditions (and for good measures, put an assertion in run_tag to check for the bounds).
xleroy (administrator)
2018-04-06 19:59

Maybe @maranget could look into this issue as well.
maranget (manager)
2018-04-09 14:25

I am having a look.
maranget (manager)
2018-04-09 16:40
edited on: 2018-04-10 09:58

I think I have found the bug, but I am lacking time to submit
a pull request now.

Basically, the problem originates from the table compaction function being
able to optimize away memory instructions [in]; while the
main output function does not notice it (and thus emits a call to caml_new_lex_engine, while a call to caml_lex_engin would be approriate).

maranget (manager)
2018-04-10 10:45
edited on: 2018-04-10 10:46

There is now a pull request that corrects this PR,
see [^]

- Issue History
Date Modified Username Field Change
2018-04-02 19:49 stedolan New Issue
2018-04-03 10:09 frisch Note Added: 0018970
2018-04-03 11:02 xclerc Note Added: 0018971
2018-04-03 11:09 nojebar Note Added: 0018972
2018-04-03 14:04 nojebar Note Edited: 0018972 View Revisions
2018-04-06 09:25 def Note Added: 0018981
2018-04-06 09:28 def Note Added: 0018982
2018-04-06 19:59 xleroy Note Added: 0018990
2018-04-06 19:59 xleroy Status new => acknowledged
2018-04-09 14:25 maranget Note Added: 0018992
2018-04-09 16:40 maranget Note Added: 0018996
2018-04-10 09:58 maranget Note Edited: 0018996 View Revisions
2018-04-10 10:45 maranget Note Added: 0019001
2018-04-10 10:46 maranget Note Edited: 0019001 View Revisions
2018-04-10 11:05 maranget Assigned To => maranget
2018-04-10 11:05 maranget Status acknowledged => resolved
2018-04-10 11:05 maranget Fixed in Version => 4.07.0+dev/beta2/rc1/rc2

Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker