New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document Char.unsafe_chr #6141
Comments
Comment author: pminten Here's a patch to add a bit of documentation to Char.unsafe_chr. |
Comment author: @damiendoligez "the program might crash" is bad (and I don't believe it's true with the current implementation). We're trying to move away from unsafe functions and anything that can cause a crash if misused: for example we sometimes daydream about removing Array.unsafe_get and unsafe_set. So I'm a bit unsatisfied with this version. |
Comment author: @dbuenzli I understand the philosophical point of view but when working on encoders/decoders it's still useful to be able to bypass the array bound checking on each access since you usually already checked sizes just before accessing a chunk of bytes and if I (once) measured correctly the performance improvements in doing so are non-negligible. So usually that's what I do: https://github.com/dbuenzli/uutf/blob/master/src/uutf.ml#L14 But that's not really satisfactory, as you need to change the source code to make the module safe. Now we have the -unsafe option that turns all safe accesses into unsafe ones, which may be a little bit to coarse grained (not to mention people actually wrongly relying on the Invalid_argument exception raised by out of bound accesses). I would argue that we actually want the converse, i.e. a -safe option that turns unsafe accesses into safe ones. Do you think such a scheme could be implemented in the compiler ? |
Comment author: @johnwhitington The unsafe_ functions are, indeed, most dangerous because they can cause corruption which eventually causes a crash distant from the call to an _unsafe function. This makes for an even worse problem than a stack overflow from a non tail-recursive Standard Library function which crashes quickly, or a floating Not_found. Perhaps the text in the patch needs finessing, but I'd like to see it documented, if only to prevent people using Obj.magic. it would be a nice idea to grep through the whole of OPAM for uses of Obj.magic in third-party code, and see how many are actually required, and how many could be replaced with existing functions like Char.unsafe_chr. |
Comment author: @gasche While I don't think the "The program might crash" part is necessary, and I see why Damien wouldn't like it (you shouldn't start auditing uses of this function first when you investigate a crash), it is actually the case that unsafe_chr, when used in combination with other unsafe features, may crash a program by breaking the invariant that let rot13_cache = (* performance matters! *) [..] rot13 (Char.unsafe_chr 257) I agree with the broad sentiment that we should document unsafe functions by stating what their (non-dynamically-checked) invariant is, instead of letting users reinvent them or use them in incorrect ways. (I'm less sure about whether they should appear in the generated ocamldoc.) What about simply the following documentation? (** Convert an integer into a character, assuming it is in the range 0--255. *) (The ocamldoc header about "unsafe functions" should be enough to imply that breaking this assumption is Very Bad.) |
This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc. |
Original bug ID: 6141
Reporter: @johnwhitington
Status: feedback (set by @damiendoligez on 2014-11-24T17:05:49Z)
Resolution: open
Priority: normal
Severity: feature
Category: documentation
Tags: patch, junior_job
Monitored by: @gasche @dbuenzli
Bug description
Rationale:
Char.unsafe_chr is widely used in third-party low level libraries which are not part of stdlib.
People who don't know Char.unsafe_chr exists replicate it using Obj.magic (this is what I used to do) This is bad.
It's no more unsafe that other unsafe_ functions in the stdlib which are documented.
As an example, in very fast lexing code, where we know an integer is actually a byte in the range 0...255, Char.chr can be quite a cost.
File attachments
The text was updated successfully, but these errors were encountered: