|Anonymous | Login | Signup for a new account||2017-10-24 00:21 CEST|
|Main | My View | View Issues | Change Log | Roadmap|
|View Issue Details|
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0006141||OCaml||documentation||public||2013-08-27 16:22||2014-11-24 18:05|
|Priority||normal||Severity||feature||Reproducibility||have not tried|
|Target Version||Fixed in Version|
|Summary||0006141: Document Char.unsafe_chr|
1) Char.unsafe_chr is widely used in third-party low level libraries which are not part of stdlib.
2) People who don't know Char.unsafe_chr exists replicate it using Obj.magic (this is what I used to do) This is bad.
3) It's no more unsafe that other unsafe_ functions in the stdlib which are documented.
As an example, in very fast lexing code, where we know an integer is actually a byte in the range 0...255, Char.chr can be quite a cost.
|Attached Files||unsafe_chr_documentation.diff [^] (581 bytes) 2013-12-29 16:58 [Show Content]|
|Here's a patch to add a bit of documentation to Char.unsafe_chr.|
"the program might crash" is bad (and I don't believe it's true with the current implementation). We're trying to move away from unsafe functions and anything that can cause a crash if misused: for example we sometimes daydream about removing Array.unsafe_get and unsafe_set.
So I'm a bit unsatisfied with this version.
I understand the philosophical point of view but when working on encoders/decoders it's still useful to be able to bypass the array bound checking on each access since you usually already checked sizes just before accessing a chunk of bytes and if I (once) measured correctly the performance improvements in doing so are non-negligible. So usually that's what I do:
But that's not really satisfactory, as you need to change the source code to make the module safe. Now we have the -unsafe option that turns *all* safe accesses into unsafe ones, which may be a little bit to coarse grained (not to mention people actually wrongly *relying* on the Invalid_argument exception raised by out of bound accesses). I would argue that we actually want the converse, i.e. a -safe option that turns unsafe accesses into safe ones. Do you think such a scheme could be implemented in the compiler ?
The unsafe_ functions are, indeed, most dangerous because they can cause corruption which eventually causes a crash distant from the call to an _unsafe function. This makes for an even worse problem than a stack overflow from a non tail-recursive Standard Library function which crashes quickly, or a floating Not_found.
Perhaps the text in the patch needs finessing, but I'd like to see it documented, if only to prevent people using Obj.magic.
it would be a nice idea to grep through the whole of OPAM for uses of Obj.magic in third-party code, and see how many are actually required, and how many could be replaced with existing functions like Char.unsafe_chr.
While I don't think the "The program might crash" part is necessary, and I see why Damien wouldn't like it (you shouldn't start auditing uses of this function first when you investigate a crash), it is actually the case that unsafe_chr, when used in combination with other unsafe features, may crash a program by breaking the invariant that `char` are in the 0..255 range.
let rot13_cache =
String.init 256 (fun i -> Char.of_int ((i + 13) mod 256))
(* performance matters! *)
let rot13 c = Array.unsafe_get rot13_cache (Char.of_int c)
rot13 (Char.unsafe_chr 257)
I agree with the broad sentiment that we should document unsafe functions by stating what their (non-dynamically-checked) invariant is, instead of letting users reinvent them or use them in incorrect ways. (I'm less sure about whether they should appear in the generated ocamldoc.)
What about simply the following documentation?
(** Convert an integer into a character, assuming it is in the range 0--255. *)
(The ocamldoc header about "unsafe functions" should be enough to imply that breaking this assumption is Very Bad.)
|2013-08-27 16:22||johnwhitington||New Issue|
|2013-08-28 11:05||doligez||Status||new => confirmed|
|2013-08-28 11:05||doligez||Target Version||=> 4.01.1+dev|
|2013-09-03 17:18||gasche||Tag Attached: junior_job|
|2013-12-29 16:58||pminten||File Added: unsafe_chr_documentation.diff|
|2013-12-29 16:59||pminten||Note Added: 0010769|
|2014-01-17 17:57||doligez||Tag Attached: patch|
|2014-05-25 20:20||doligez||Target Version||4.01.1+dev => 4.02.0+dev|
|2014-07-24 20:18||doligez||Note Added: 0011906|
|2014-07-24 20:18||doligez||Severity||minor => feature|
|2014-07-24 20:18||doligez||Status||confirmed => feedback|
|2014-07-24 20:18||doligez||Target Version||4.02.0+dev =>|
|2014-07-24 20:53||dbuenzli||Note Added: 0011907|
|2014-07-25 16:48||johnwhitington||Note Added: 0011923|
|2014-07-25 16:48||johnwhitington||Status||feedback => new|
|2014-08-05 19:20||gasche||Note Added: 0011972|
|2014-11-24 18:05||doligez||Status||new => feedback|
|2017-02-23 16:35||doligez||Category||OCaml documentation => Documentation|
|2017-02-23 16:44||doligez||Category||Documentation => documentation|
|Copyright © 2000 - 2011 MantisBT Group|