New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
As possible, eliminate undefined behavior from stdlib #7812
Comments
Comment author: @xavierleroy Concerning the FP-to-integer conversions, your analysis is overly pessimistic. Of course we assume the C compiler to implement IEEE-754 FP arithmetic as described in Annex F of the ISO C standards. The Annex says: F.4 Floating to integer conversion If the floating value is infinite or NaN or if the integral part of F.7.3 Execution At program startup the floating-point environment is initialized as So, an overflow in FP->integer conversion is not UB and will just return an unspecified integer value. Additionally, chapter 7 (standard library) specifies no library function to enable trapping or stopping behaviors for FP exceptions. (In glibc, it's a nonstandard function, feenableexcept().) So, you have to go out of your way to get a SIGFPE out of an FP->integer overflow. |
Comment author: @xavierleroy More generally, I agree that the use of "undefined behavior" in OCaml's library function descriptions is a bad choice of words, given the sinister implications "undefined behavior" has in C. For OCaml functions, it's more like "unspecified results", i.e. the function can return any value of its return type, or raise any exception, but will have no other effect. |
Comment author: @whitequark I think you're right that this is not UB, but an unspecified bit pattern is still quite bad because upon "use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance". That is, if you have a function uint8_t f() { return (uint8_t)1e100; } then both of these programs may print "taken": if(f() == 1 && f() == 2) puts("taken"); uint8_t x = f(); (LLVM has a section explicitly calling out this behavior as valid: https://llvm.org/docs/LangRef.html#undefined-values). In practice, This is of course less bad than UB but I would argue that this still does not match the mental model an OCaml programmer may have about unspecified bit patterns. In particular, an OCaml programmer (and, really, most C programmers) would expect an unspecified bit pattern to stay the same so long as whatever operation produces it stays the same, but this isn't the case in C. |
Comment author: @whitequark I should perhaps talk more about real-world impact of this use of unspecified values in OCaml runtime. I don't think any real-world C compiler out there would actually produce code for the OCaml runtime that would lead to algebraically impossible results like However, real-world C compilers will definitely produce code that in case of out-of-range conversions will have different results on different platforms (off the top of my head I think x86 and ARM64 would have different behavior here, though I'm far from an expert) and with different optimization levels, and in my opinion the OCaml compiler and runtime should either not permit this, or very clearly state that this may happen. |
This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc. |
Still an issue as far as I remember. |
This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc. |
Original bug ID: 7812
Reporter: @whitequark
Status: acknowledged (set by @xavierleroy on 2018-06-23T15:39:30Z)
Resolution: open
Priority: normal
Severity: minor
Category: standard library
Bug description
Unlike the C language, the OCaml language does not have a strict notion of undefined behavior that gives huge leeway to optimizer and is relatively well understood by programmers. In spite of that, the OCaml standard library documents certain behaviors as "undefined".
This is undesirable for several reasons. First, just saying "undefined" does not communicate very well what could happen, which is anything. In C, if an execution ever encounters undefined behavior, the entire execution (before, during, and after) is illegal, and can perform any operation, but the OCaml programmer is not familiar with this.
Second, the standard library leaves as undefined certain behaviors that could trivially be made perfectly well defined, or at least aren't actually undefined and ought to be documented as e.g. raising an exception.
Here is the complete list:
Filename.chop_suffix: would raise an exception or chop an unrelated part of filename. Should be fixed to raise Invalid_argument in that case instead, for consistency with chop_extension.
Buffer.to_seq, Buffer.to_seqi: raises an exception if a buffer is truncated, or returns no new elements if it's extended. There doesn't appear to be any implementation benefit from leaving it undefined, although I may just be missing it and in that case please correct me.
Sort.merge: returns incorrect values. (It's deprecated since 2000 so it can probably be removed already.)
Int32.of_float, Int64.of_float, Nativeint.of_float, Targetint.of_float: would actually invoke UB in C on out-of-range conversions. Either the FP environment should be set up appropriately, or the documentation should get a very stern warning that anything can happen, including the OCaml process dying with SIGFPE. This is the only case where it actually makes sense (barely, perhaps) to have UB.
The text was updated successfully, but these errors were encountered: