Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

little bug in int_of_string #4210

Closed
vicuna opened this issue Feb 19, 2007 · 7 comments
Closed

little bug in int_of_string #4210

vicuna opened this issue Feb 19, 2007 · 7 comments
Assignees
Labels

Comments

@vicuna
Copy link

vicuna commented Feb 19, 2007

Original bug ID: 4210
Reporter: @oandrieu
Assigned to: @xavierleroy
Status: closed (set by @xavierleroy on 2011-05-29T10:14:04Z)
Resolution: fixed
Priority: normal
Severity: minor
Version: 3.10+dev
Fixed in version: 3.12.0+dev
Category: ~DO NOT USE (was: OCaml general)
Has duplicate: #4245 #4876
Related to: #3302 #3582
Monitored by: rpjames till "Julien Signoles" yminsky nogin @mmottl

Bug description

int_of_string "max_int + 1" does not raise an exception as it should.

max_int ;;

  • : int = 1073741823

int_of_string "1073741824" ;;

  • : int = -1073741824

in byterun/ints.c we have this comment:
/* Signed representation expected, allow -2^(nbits-1) to 2^(nbits - 1) */
the upper bound should be "2^(nbits - 1) - 1"

File attachments

@vicuna
Copy link
Author

vicuna commented Feb 19, 2007

Comment author: @oandrieu

This is suspicious too :

int_of_string "0x4000_0000" ;;

  • : int = -1073741824

int_of_string "-0x4000_0000" ;;

  • : int = -1073741824

@vicuna
Copy link
Author

vicuna commented Feb 21, 2007

Comment author: @xavierleroy

This is a known issue: when lexing source files, the (valid) integer literal
-1073741824 is read as "token -", then "integer (int_of_string "1073741824")),
and converted to -1073741824 later during parsing. It is therefore necessary that int_of_string "1073741824" does not fail. A cleaner approach would be to avoid int_of_string during lexing and use a specific string->int conversion function, but that's more work...

Concerning int_of_string "0x4000_0000", we adopt the convention that hexadecimal, octal and binary literals can be written either as unsigned (range 0..0x7FFF_FFFF) or signed (range -0x4000_0000..0x3FFF_FFFF). The same convention is used by Java for instance.

@vicuna
Copy link
Author

vicuna commented Jun 23, 2009

Comment author: @xavierleroy

I'm reopening this PR following discussions with Consortium members. It appears possible to fix int_of_string (so that it fails if given "max_int + 1") without breaking the lexer/parser. To be done for 3.12.

@vicuna
Copy link
Author

vicuna commented Jun 26, 2009

Comment author: @mshinwell

Here's one I prepared earlier :)

This patch fixes the system so that int_of_string and friends do not permit the relevant largest positive integer, plus one, to be converted. It does not affect lexing and parsing in any way.

This has only been tested with 3.11.0 64-bit so far, using an automatic test generator. We can do some further testing internally if the patch is likely to be of use.

@vicuna
Copy link
Author

vicuna commented Jun 26, 2009

Comment author: @mshinwell

I should add that the patch I've just uploaded only affects conversions in base 10. As far as I can work out, conversions in other bases are to be treated as if they are unsigned conversions (which makes sense, since you might want to directly set the sign bit in a hex representation).

@vicuna
Copy link
Author

vicuna commented Jul 15, 2009

Comment author: @xavierleroy

Tentative fix in CVS head. String -> integer conversion functions
(int_of_string, Int32.of_string, etc) now have tight bounds.

However, the lexer still accepts the literal representing
"max_int + 1", as before, and converts it to min_int, as before.

The fix should be functionally equivalent to Mark's patch, except that
we don't add new C functions; instead, we're using the folklore trick
of converting an integer literal s via -(int_of_string "-" ^ s).

@talex5
Copy link
Contributor

talex5 commented Feb 5, 2023

Concerning int_of_string "0x4000_0000", we adopt the convention that hexadecimal, octal and binary literals can be written either as unsigned (range 0..0x7FFF_FFFF) or signed (range -0x4000_0000..0x3FFF_FFFF). The same convention is used by Java for instance.

This makes some sense when the width of the int is known (e.g. for int32 or int64), but for native ints it seems a bit odd. It means that whether a constant is positive or negative depends on the word size of the platform. Note that Java defines int to be 32-bit on all platforms (https://docs.oracle.com/javase/specs/jls/se7/html/jls-4.html#jls-4.2), so it does not have this problem.

(we hit this in ocaml-multicore/eio#432, where we made the mistake of defining max_luv_buffer_size = 0x7fffffff, which gave the wrong result on 32-bit platforms, though luckily it was caught by a unit test)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants