Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007812OCamlstandard librarypublic2018-06-23 07:022018-07-10 06:36
Reporterwhitequark 
Assigned To 
PrioritynormalSeverityminorReproducibilityalways
StatusacknowledgedResolutionopen 
PlatformOSOS Version
Product Version 
Target VersionFixed in Version 
Summary0007812: As possible, eliminate undefined behavior from stdlib
DescriptionUnlike the C language, the OCaml language does not have a strict notion of undefined behavior that gives huge leeway to optimizer and is relatively well understood by programmers. In spite of that, the OCaml standard library documents certain behaviors as "undefined".

This is undesirable for several reasons. First, just saying "undefined" does not communicate very well what could happen, which is anything. In C, if an execution ever encounters undefined behavior, the entire execution (before, during, and after) is illegal, and can perform any operation, but the OCaml programmer is not familiar with this.

Second, the standard library leaves as undefined certain behaviors that could *trivially* be made perfectly well defined, or at least aren't actually undefined and ought to be documented as e.g. raising an exception.

Here is the complete list:

  * Filename.chop_suffix: would raise an exception or chop an unrelated part of filename. Should be fixed to raise Invalid_argument in that case instead, for consistency with chop_extension.

  * Buffer.to_seq, Buffer.to_seqi: raises an exception if a buffer is truncated, or returns no new elements if it's extended. There doesn't appear to be any implementation benefit from leaving it undefined, although I may just be missing it and in that case please correct me.

  * Sort.merge: returns incorrect values. (It's deprecated since 2000 so it can probably be removed already.)

  * Int32.of_float, Int64.of_float, Nativeint.of_float, Targetint.of_float: would actually invoke UB in C on out-of-range conversions. Either the FP environment should be set up appropriately, or the documentation should get a very stern warning that anything can happen, including the OCaml process dying with SIGFPE. This is the only case where it actually makes sense (barely, perhaps) to have UB.
TagsNo tags attached.
Attached Files

- Relationships

-  Notes
(0019209)
xleroy (administrator)
2018-06-23 17:39
edited on: 2018-06-23 18:31

Concerning the FP-to-integer conversions, your analysis is overly pessimistic. Of course we assume the C compiler to implement IEEE-754 FP arithmetic as described in Annex F of the ISO C standards. The Annex says:

F.4 Floating to integer conversion

If the floating value is infinite or NaN or if the integral part of
the floating value exceeds the range of the integer type, then the
‘‘invalid’’ floating-point exception is raised and the resulting value
is unspecified.

F.7.3 Execution

At program startup the floating-point environment is initialized as
prescribed by IEC 60559:
[...]
— Trapping or stopping (if supported) is disabled on all
  floating-point exceptions.

So, an overflow in FP->integer conversion is not UB and will just return an unspecified integer value.

Additionally, chapter 7 (standard library) specifies no library function to enable trapping or stopping behaviors for FP exceptions. (In glibc, it's a nonstandard function, feenableexcept().) So, you have to go out of your way to get a SIGFPE out of an FP->integer overflow.

(0019210)
xleroy (administrator)
2018-06-23 17:41

More generally, I agree that the use of "undefined behavior" in OCaml's library function descriptions is a bad choice of words, given the sinister implications "undefined behavior" has in C. For OCaml functions, it's more like "unspecified results", i.e. the function can return any value of its return type, or raise any exception, but will have no other effect.
(0019225)
whitequark (developer)
2018-07-10 06:32

I think you're right that this is not UB, but an unspecified bit pattern is still quite bad because upon "use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance".

That is, if you have a function

  uint8_t f() { return (uint8_t)1e100; }

then both of these programs may print "taken":

  if(f() == 1 && f() == 2) puts("taken");

  uint8_t x = f();
  if(x ^ x) puts("taken");

(LLVM has a section explicitly calling out this behavior as valid: https://llvm.org/docs/LangRef.html#undefined-values [^]).

In practice, `f` may be translated to just (on e.g. x86) `ret`, and the return value will be taken from whatever junk was in rax. You can also stumble upon code that would exhibit apparently absurd yet deterministic behavior when the optimizer chooses to interpret the unspecified bit pattern differently in different instances, or behavior differing depending on optimization level. (There are well-known examples of such code that you will be able to find, though I don't consider them important for discussion because the standard clearly permits this behavior already.)

This is of course less bad than UB but I would argue that this still does not match the mental model an OCaml programmer may have about unspecified bit patterns. In particular, an OCaml programmer (and, really, most C programmers) would expect an unspecified bit pattern to stay the same so long as whatever operation produces it stays the same, but this isn't the case in C.
(0019226)
whitequark (developer)
2018-07-10 06:36
edited on: 2018-07-10 06:39

I should perhaps talk more about real-world impact of this use of unspecified values in OCaml runtime. I don't think any real-world C compiler out there would actually produce code for the OCaml runtime that would lead to algebraically impossible results like `f () != f ()`, although it would certainly be permitted to.

However, real-world C compilers will definitely produce code that in case of out-of-range conversions will have different results on different platforms (off the top of my head I think x86 and ARM64 would have different behavior here, though I'm far from an expert) and with different optimization levels, and in my opinion the OCaml compiler and runtime should either not permit this, or very clearly state that this may happen.


- Issue History
Date Modified Username Field Change
2018-06-23 07:02 whitequark New Issue
2018-06-23 07:10 whitequark Description Updated View Revisions
2018-06-23 07:20 whitequark Description Updated View Revisions
2018-06-23 17:39 xleroy Note Added: 0019209
2018-06-23 17:39 xleroy Status new => acknowledged
2018-06-23 17:41 xleroy Note Added: 0019210
2018-06-23 18:27 xleroy Note Edited: 0019209 View Revisions
2018-06-23 18:31 xleroy Note Edited: 0019209 View Revisions
2018-07-10 06:32 whitequark Note Added: 0019225
2018-07-10 06:36 whitequark Note Added: 0019226
2018-07-10 06:39 whitequark Note Edited: 0019226 View Revisions


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker