Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No clearly correct way to handle Failure exceptions from int_of_string and similar functions #7392

Closed
vicuna opened this issue Oct 23, 2016 · 15 comments

Comments

@vicuna
Copy link

vicuna commented Oct 23, 2016

Original bug ID: 7392
Reporter: bartjacobs
Status: resolved (set by @xavierleroy on 2016-12-07T18:14:17Z)
Resolution: duplicate
Priority: normal
Severity: minor
Version: 4.03.0
Category: documentation
Related to: #7394
Monitored by: @gasche

Bug description

Since #254 (Warning on fragile literals in constructor argument patterns) (released in OCaml 4.03), the documentation for int_of_string, which says "Raise Failure "int_of_string" if the given string is not a valid representation of an integer, or if the integer represented exceeds the range of integers representable in type int.", is inconsistent with the OCaml implementation (specifically, file typing/predef.ml), which declares exception Failure with the warn_on_literal_pattern attribute. The documentation for int_of_string is pretty unequivocal that the behavior of int_of_string in case of malformed input is exactly to raise the exception Failure "int_of_string". The warn_on_literal_pattern attribute, on the other hand, suggests that the value of the argument of Failure should not be relied upon, i.e. that instead of throwing Failure "int_of_string", int_of_string might in a future OCaml release throw something else in case of malformed input.

This is problematic for someone trying to write code that uses int_of_string and that does something other than propagating the exception in case of malformed input.

The warning 52 documentation (Sec. 8.5.1, in the Batch compilation chapter) suggests that this person should simply catch any exception that matches Failure _.

However, this raises the following question: What if int_of_string throws a Failure exception for a reason other than malformed input? Perhaps, for int_of_string this is a rather far-fetched scenario, but (without looking at the source code) it is not entirely inconceivable. And for some of the other similar cases, such as int_of_big_int, with presumably a more complex implementation with more failure modes, this is somewhat more conceivable still. And in any case, the documentation for these functions does not state that this will not happen.

Client code should propagate such an exception instead of treating it as if it indicated malformed input.

To fix this problem in a backward-compatible way, I suggest to update the documentation for int_of_string and similar functions (including int_of_big_int), by adding the sentence "Do not raise a Failure for any other reason." Another way to put it is to replace "Raise Failure "int_of_string" if the given string is not a valid representation of an integer, or if the integer represented exceeds the range of integers representable in type int." by "Raise Failure if (and only if) the given string is not a valid representation of an integer, or if the integer represented exceeds the range of integers representable in type int." (while of course continuing to raise only Failure "int_of_string" for backward compatibility).

(Of course, one should also check that the current implementations of these functions do indeed not throw Failure for other reasons!)

(Note: a (much less severe) lack of clarity in the documentation of int_of_string and similar functions existed already before OCaml 4.03: it did not explicitly guarantee that it would not throw Failure "int_of_string" for a reason other than malformed input. To see how this is not entirely inconceivable, one could vaguely imagine an implementation that would in some cases directly or indirectly perform a nested int_of_string call with some other input. Again, this is more conceivable with functions that are more complex than int_of_string. But again, the main argument is not about whether it is conceivable, but about whether the documentation should be explicit about it, which seems a clear "yes" since people do rely on it and furthermore it's not hard to fix.)

((Even more parenthesized note: for any function with a non-trivial implementation that involves nested function calls, it is not actually easy to guarantee that the function will not throw any given exception, especially if the exception is a generic one such as Failure. This suggests that attaching any postcondition (other than "true") to a function raising an exception is dubious, and instead the function's return type should probably be enriched (from 'a to 'a option or 'a + string) so that the exceptional condition can be indicated as a special return value. A client who wishes to ignore the exceptional case can easily turn the special return value into an exception using a partial match (or a helper function that raises an appropriate exception). Note that this is pretty much how error handling is done in Erlang. So, perhaps a better solution would be to introduce a new function, perhaps called int_option_of_string, that returns an int option.))

Additional information

(I put Severity "minor" because this issue is unlikely to lead to many problems at run time, but on the other hand I think this is currently impacting many people who care about writing clearly correct code.)

@vicuna
Copy link
Author

vicuna commented Oct 27, 2016

Comment author: @alainfrisch

the documentation for int_of_string, which says "Raise Failure "int_of_string" if the given string is not a valid representation of an integer, or if the integer represented exceeds the range of integers representable in type int.", is inconsistent with the OCaml implementation (specifically, file typing/predef.ml), which declares exception Failure with the warn_on_literal_pattern attribute.

I don't see any inconsistency here. It it true that the function raises Failure "int_of_string" (this describes the current version), and it is also true that it is discouraged to match on this exact string (it is a recommendation to make your code more future proof).

That said, I'd be rather in favor of dropping the explicit literal from the documentation.

"Raise Failure if (and only if)"

I'm not necessarily against turning all "Raise ... if" into "Raise ... iff" where it makes sense, although I'm not sure the extra information brought by the change deserves the effort. But if you feel like this is important enough, and if nobody else objects to the change in principle, feel free to submit a PR implementing this suggestion (and dropping the explicit literals in the doc as well).

@vicuna
Copy link
Author

vicuna commented Oct 27, 2016

Comment author: bartjacobs

Actually, on closer thought, I'm withdrawing my fix proposal above. I think it's too dangerous to attach any postcondition to a Failure exception being raised. It's too easy for a Failure exception to be propagated from a nested call inadvertently. And even if that is "obviously" not the case for the functions involved (int_of_string, int_of_big_int, etc.), it is setting a bad precedent pedagogically.

In my view, a better approach is to introduce a new variant of each of these functions that returns an option. The only problem is that I don't know a good name: int_opt_of_string? int_of_string_opt? int_of_string'? int_of_string_? If backward compatibilty was not an issue, I'd propose just int_of_string...

@vicuna
Copy link
Author

vicuna commented Oct 27, 2016

Comment author: @gasche

I think several extensions of the standard library have adopted the scheme of using foo_opt and foo_exn for the explicitly option-returning or exception-raising versions, with foo being one of them determined by history or convenience or performance.

@vicuna
Copy link
Author

vicuna commented Oct 27, 2016

Comment author: bartjacobs

OK, so that argues in favor of the foo_opt approach.

So, my proposal would be to add the option-returning variants, and to leave the docs for the current functions unchanged, including the explicit literals (because I see no good reason to further annoy the people relying on this; the spurious warning 52 they're now getting is plenty annoying already). (I call it spurious because for the specific case of int_of_string, it's hard to argue in favor of breaking people's code for the dubious benefit of improving the purported helpfulness of the Failure argument, and even harder when we're only talking about a hypothetical future improvement.)

Note that I opened another PR, 0007394 , to deal with the issue that the warning 52 docs dangerously encourage users to catch Failure _. I propose that in the present PR we track the more specific technical issue that with the current docs there is no clearly correct way to handle Failure exceptions from int_of_string and similar functions, because they don't explicitly guarantee that no Failure _ will be thrown for other reasons. The current proposed fix for that is to introduce int_of_string_opt and similar functions.

I'll try to submit a GPR for this in the near future.

@vicuna
Copy link
Author

vicuna commented Oct 27, 2016

Comment author: @alainfrisch

I think there is a rather widespread consensus that stdlib should rely much less on exceptions, in an ideal world. This is not just for Failure exceptions, but also for all lookup functions currently raising Not_found (Hashtbl.find, etc). (I think nobody argues against Invalid_argument, though. For Sys_error and co, I don't know.)

We cannot simply change the type of existing functions in the stdlib, but adding non-raising versions is of course possible. If you are ready to spend some time on it, I'd still suggest to check first that the idea has a chance of being accepted before embarking in the endeavor. I'm in favor, but other might disagree.

dangerously encourage users to catch Failure _

Catching "Failure _" is not in itself dangerous if you have the discipline to do it locally enough, as in:

try Some (int_of_string x) with Failure _ -> None

or:

match int_of_string x with
| exception Failure _ -> raise My_own_error (* or something else *)
| i -> ...

@vicuna
Copy link
Author

vicuna commented Oct 27, 2016

Comment author: @gasche

I'm skeptical of any wide-sweeping change to implement options-rather-than-exceptions in a consistent way in the library. There are other libraries in the wild that do this, and can move faster and better than the stdlib (... maybe). On the other way, having the base building blocks that rely on C runtime code also come with a non-exceptional version may make sense from a performance perspective (... or would that be just as slow as (try ... with _ -> ...) on the OCaml side?).

@vicuna
Copy link
Author

vicuna commented Oct 27, 2016

Comment author: @alainfrisch

I think there is general interest in improving the stdlib, and I don't see why one would not be able to be as consistent as external projects. (Yes, I know, the current situation is already inconsistent, with char_of_int/bool_of_string raising Invalid_argument and int_of_string/float_of_string raising Failure.)

@vicuna
Copy link
Author

vicuna commented Oct 28, 2016

Comment author: @xavierleroy

For the record, I'm still unconvinced that the Failure exception should have warn_on_literal_pattern behavior. The warn_on_literal_pattern mechanism was introduced to deal with Invalid_argument, where it makes complete sense. Whether it also makes sense for Failure was silently assumed but might deserve an actual discussion.

On another topic, the intent of "Raise Failure xxx if yyy" is really that no Failure exception is raised in any other situation. This documentation is written in English, not in legalese nor in standardese nor in mathematical logic. One could regret that it is not written in mathematical logic. But Adding "if (and only if)" is just legalese and I don't think it helps.

@vicuna
Copy link
Author

vicuna commented Oct 28, 2016

Comment author: @alainfrisch

The warn_on_literal_pattern mechanism was introduced to deal with Invalid_argument, where it makes complete sense. Whether it also makes sense for Failure was silently assumed but might deserve an actual discussion.

Can you elaborate on the distinction? I don't see any deep difference, especially that the stdlib is not very consistent in its distinction between Failure and Invalid_argument; see examples in my previous note, but also for instance: List.nth can raise both Invalid_argument and Failure while String.sub can only raise Invalid_argument for similar "errors".

@vicuna
Copy link
Author

vicuna commented Oct 28, 2016

Comment author: @alainfrisch

It's useful to look at 8aad777, which adapts the core distribution to the new warning:

  • In debugger, the scanner could raise Failure "int_of_string" in its rule that parses integers. This used to be detected (what that literal) in command_line.ml and turned into a user-facing "Integer overflow" error. I claim this is way too fragile (imagine another rule raising the same exception for another reason) and hard to read (no way to know that this Failure "int_of_string" can only correspond to an actual overflow without reading careful the whole lexer). Of course, one should not blindly turn the pattern into "Failure _". The right fix is to detect the error much more locally and turn it into a more specific error just around the call to Int64.of_string.

  • The other place that was fixed was in the Arg module. The code used to be:

    try f (int_of_string arg)
    with Failure "int_of_string" ->
    raise (Stop (Wrong (s, arg, "an integer")))

    which is buggy: the user-provided callback "f" could raise itself "inf_of_string" for an unrelated reason, and this would be incorrectly reported as a command-line error.

    (and other simular places, sometimes with Invalid_argument instead of Failure)

@vicuna
Copy link
Author

vicuna commented Oct 28, 2016

Comment author: @xavierleroy

Can you elaborate on the distinction?

I refer you to the fine manual, http://caml.inria.fr/pub/docs/manual-ocaml/core.html. Invalid_argument denotes a serious programming error in the application, the library function should never be called with arguments that lead to raising Invalid_argument. Failure is more like Not_found: it can be raised in legitimate situations, then it's just the application's responsibility to handle it.

In other words, if we were to move to OK | Error results instead of exceptions, Failure cases would return Error results, but Invalid_argument cases would still raise exceptions.

@vicuna
Copy link
Author

vicuna commented Oct 28, 2016

Comment author: @alainfrisch

In other words, if we were to move to OK | Error results instead of exceptions, Failure cases would return Error results

Agreed! This would basically force to match the result immediately, which can be achieved today with:

match int_of_string s with
| exception (Failure _) -> ...
| i -> ...

In such a local handler, matching on the literal does not bring much information. And if we let Failure escape, the argument becomes purely informative, as for Invalid_argument. Matching on the literal too far away (as was done in debugger/ before 8aad777) is error prone and obscure.

I'm yet to see a concrete case of code where matching on the literal under Failure is the right approach.

@vicuna
Copy link
Author

vicuna commented Oct 28, 2016

Comment author: bartjacobs

Catching "Failure _" is not in itself dangerous if you have the discipline to do it locally enough, as in:

try Some (int_of_string x) with Failure _ -> None

or:

match int_of_string x with
| exception Failure _ -> raise My_own_error (* or something else *)
| i -> ...

Yes it is dangerous! What if int_of_string's implementation, and particularly functions called by that implementation, raise Failure _ for another reason? For the particular case of int_of_string, that might be far-fetched, but in general this is a dangerous pattern.

the intent of "Raise Failure xxx if yyy" is really that no Failure exception is raised in any other situation. This documentation is written in English, not in legalese nor in standardese nor in mathematical logic. One could regret that it is not written in mathematical logic. But Adding "if (and only if)" is just legalese and I don't think it helps.

I agree that the docs need not be written in logic, but they should be sufficiently clear that all parties (clients and implementors) are sufficiently reminded of their obligations. In particular, I fear that an implementor reading "Raise Failure xxx if yyy" is insufficiently reminded of the need to ascertain that Failure xxx is not raised for other reasons. And a fortiori they are not reminded of the need to ascertain that Failure _ is not raised for other reasons. This is the more problematic since for non-trivial implementations ascertaining same is highly non-trivial (and, in fact, not possible modularly because functions typically do not document an upper bound for the exceptions they may raise).

The problem with vague docs is that each party will interpret them so as to minimize their own obligations, leading to inconsistent interpretations.

Given this issue, a client would be rightly hesitant to rely on the function not raising Failure _. Hence the title of this PR, and the need for more clarity.

@vicuna
Copy link
Author

vicuna commented Nov 2, 2016

Comment author: @alainfrisch

#885 is a proposal to add option-returning variants for stdlib functions (including *_of_string ones).

@vicuna
Copy link
Author

vicuna commented Dec 7, 2016

Comment author: @xavierleroy

Apparently, and for no good reason, the discussion was continued under PR #7394, so I'm marking this one as resolved/duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant