Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genlex fails to parse (e.g.) the string "42e" #6967

Closed
vicuna opened this issue Aug 21, 2015 · 4 comments
Closed

Genlex fails to parse (e.g.) the string "42e" #6967

vicuna opened this issue Aug 21, 2015 · 4 comments
Assignees

Comments

@vicuna
Copy link

vicuna commented Aug 21, 2015

Original bug ID: 6967
Reporter: erwan
Assigned to: @gasche
Status: closed (set by @xavierleroy on 2017-02-16T14:15:00Z)
Resolution: not a bug
Priority: low
Severity: minor
Version: 4.02.3
Category: standard library

Bug description

Genlex fails to parse the string "42e" (and much more)
and raises a float_of_string exception.

the string "123456f" is fine on the other hand. The problem occurs when a 'e'
follows some int.

Steps to reproduce

let l = Genlex.make_lexer [] (Stream.of_char "42 ok 42still_ok 42e_not_ok);;
Stream.next l; Stream.next l; Stream.next l; Stream.next l;;
(* ouch! *)
Stream.next l;;

Additional information

a possible patch would be something like:
@@ -129,7 +129,10 @@
match Stream.peek strm__ with
Some ('0'..'9' as c) ->
Stream.junk strm__; let s = strm__ in store c; end_exponent_part s

  • | _ -> Some (Float (float_of_string (get_string ())))
  • | _ ->
  •   let str = get_string () in
    
  •   try Some (Float (float_of_string str))
    
  •   with _ -> Some (String str)
    
@vicuna
Copy link
Author

vicuna commented Aug 21, 2015

Comment author: @gasche

I'm not sure the failure is a bug, given that 42e is an invalid float literal (partial scientific notation), to be continued in 42e2 for example. If you type "42e" in an OCaml toplevel, you will get an error. Given that Genlex claims to follow OCaml's lexical conventions, this does not seem very surprising.

Otherwise would you expect "42e" to be parsed as "42 e", but "42e10" as "42e10"?

@vicuna
Copy link
Author

vicuna commented Aug 23, 2015

Comment author: erwan

I expect "42e" to be parsed as (String "42e"). "42e" is a valid string. Why should it reject "42e" and accept "42f" anyway?

Moreover, it would nice (and easy) if make_lexer would accept as (optional) arguments an int_of_string and a float_of_string function.

ps: my patch is not very good actually ; I can provide a better one.

@vicuna
Copy link
Author

vicuna commented Aug 23, 2015

Comment author: @gasche

The input "42e" cannot be parsed as (String "42e"), because String denotes string literals. The input ""42e"" (note the quotes included in the input) is parsed as (String "42e").

I explained in my message above a justification of why "42e" fails (I'm not actually sure that this is why the implementation fail, but that should not matter): Genlex sees the beginning of a float literal such as "42e3" (which is the floating point number also written 42_000.). There is no continuation of "42f" that is a valid literal, so it is parsed as "42" followed by "f". Note that the fact of accepting an alphabetic identifier just after an integer literal is now seen as a mistake, and has been removed from the in-development OCaml version (currently Genlex still accepts "42f" because it has not been updated).

The point of Genlex is to make it easy to develop quick&dirty parsers for prototypes and experiments, following the OCaml lexical conventions -- this is explicitly spelled out in the documentation. If you want a flexible lexer, you should use something else, such as an ocamllex-generated lexer (or other lexer projects, see for example the promising sedlex https://github.com/alainfrisch/sedlex ).

@vicuna
Copy link
Author

vicuna commented Aug 24, 2015

Comment author: erwan

I'm not convinced at all. I'll have a look to sedlex, thanks.

ps: Actually I meant « Ident "42e"» not « String "42e"», sorry. I get your point concerning identifier following integers that are now rejected by ocaml and not genlex. Still accepting "42f" as an ident and not "42e" looks like an unresolved (yet minor) bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants