New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Genlex fails to parse (e.g.) the string "42e" #6967
Comments
Comment author: @gasche I'm not sure the failure is a bug, given that 42e is an invalid float literal (partial scientific notation), to be continued in 42e2 for example. If you type "42e" in an OCaml toplevel, you will get an error. Given that Genlex claims to follow OCaml's lexical conventions, this does not seem very surprising. Otherwise would you expect "42e" to be parsed as "42 e", but "42e10" as "42e10"? |
Comment author: erwan I expect "42e" to be parsed as (String "42e"). "42e" is a valid string. Why should it reject "42e" and accept "42f" anyway? Moreover, it would nice (and easy) if make_lexer would accept as (optional) arguments an int_of_string and a float_of_string function. ps: my patch is not very good actually ; I can provide a better one. |
Comment author: @gasche The input "42e" cannot be parsed as (String "42e"), because String denotes string literals. The input ""42e"" (note the quotes included in the input) is parsed as (String "42e"). I explained in my message above a justification of why "42e" fails (I'm not actually sure that this is why the implementation fail, but that should not matter): Genlex sees the beginning of a float literal such as "42e3" (which is the floating point number also written 42_000.). There is no continuation of "42f" that is a valid literal, so it is parsed as "42" followed by "f". Note that the fact of accepting an alphabetic identifier just after an integer literal is now seen as a mistake, and has been removed from the in-development OCaml version (currently Genlex still accepts "42f" because it has not been updated). The point of Genlex is to make it easy to develop quick&dirty parsers for prototypes and experiments, following the OCaml lexical conventions -- this is explicitly spelled out in the documentation. If you want a flexible lexer, you should use something else, such as an ocamllex-generated lexer (or other lexer projects, see for example the promising sedlex https://github.com/alainfrisch/sedlex ). |
Comment author: erwan I'm not convinced at all. I'll have a look to sedlex, thanks. ps: Actually I meant « Ident "42e"» not « String "42e"», sorry. I get your point concerning identifier following integers that are now rejected by ocaml and not genlex. Still accepting "42f" as an ident and not "42e" looks like an unresolved (yet minor) bug. |
Original bug ID: 6967
Reporter: erwan
Assigned to: @gasche
Status: closed (set by @xavierleroy on 2017-02-16T14:15:00Z)
Resolution: not a bug
Priority: low
Severity: minor
Version: 4.02.3
Category: standard library
Bug description
Genlex fails to parse the string "42e" (and much more)
and raises a float_of_string exception.
the string "123456f" is fine on the other hand. The problem occurs when a 'e'
follows some int.
Steps to reproduce
let l = Genlex.make_lexer [] (Stream.of_char "42 ok 42still_ok 42e_not_ok);;
Stream.next l; Stream.next l; Stream.next l; Stream.next l;;
(* ouch! *)
Stream.next l;;
Additional information
a possible patch would be something like:
@@ -129,7 +129,10 @@
match Stream.peek strm__ with
Some ('0'..'9' as c) ->
Stream.junk strm__; let s = strm__ in store c; end_exponent_part s
The text was updated successfully, but these errors were encountered: