Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0006141OCamlOCaml documentationpublic2013-08-27 16:222014-08-05 19:20
Reporterjohnwhitington 
Assigned To 
PrioritynormalSeverityfeatureReproducibilityhave not tried
StatusnewResolutionopen 
PlatformOSOS Version
Product Version 
Target VersionFixed in Version 
Summary0006141: Document Char.unsafe_chr
DescriptionRationale:

1) Char.unsafe_chr is widely used in third-party low level libraries which are not part of stdlib.

2) People who don't know Char.unsafe_chr exists replicate it using Obj.magic (this is what I used to do) This is bad.

3) It's no more unsafe that other unsafe_ functions in the stdlib which are documented.

As an example, in very fast lexing code, where we know an integer is actually a byte in the range 0...255, Char.chr can be quite a cost.
Tagsjunior_job, patch
Attached Filesdiff file icon unsafe_chr_documentation.diff [^] (581 bytes) 2013-12-29 16:58 [Show Content]

- Relationships

-  Notes
(0010769)
pminten (reporter)
2013-12-29 16:59

Here's a patch to add a bit of documentation to Char.unsafe_chr.
(0011906)
doligez (administrator)
2014-07-24 20:18

"the program might crash" is bad (and I don't believe it's true with the current implementation). We're trying to move away from unsafe functions and anything that can cause a crash if misused: for example we sometimes daydream about removing Array.unsafe_get and unsafe_set.

So I'm a bit unsatisfied with this version.
(0011907)
dbuenzli (reporter)
2014-07-24 20:53

I understand the philosophical point of view but when working on encoders/decoders it's still useful to be able to bypass the array bound checking on each access since you usually already checked sizes just before accessing a chunk of bytes and if I (once) measured correctly the performance improvements in doing so are non-negligible. So usually that's what I do:

https://github.com/dbuenzli/uutf/blob/master/src/uutf.ml#L14 [^]
https://github.com/dbuenzli/jsonm/blob/master/src/jsonm.ml#L12 [^]
https://github.com/dbuenzli/dicomm/blob/master/src/dicomm.ml#L23 [^]
https://github.com/dbuenzli/otfm/blob/master/src/otfm.ml#L11 [^]
etc.

But that's not really satisfactory, as you need to change the source code to make the module safe. Now we have the -unsafe option that turns *all* safe accesses into unsafe ones, which may be a little bit to coarse grained (not to mention people actually wrongly *relying* on the Invalid_argument exception raised by out of bound accesses). I would argue that we actually want the converse, i.e. a -safe option that turns unsafe accesses into safe ones. Do you think such a scheme could be implemented in the compiler ?
(0011923)
johnwhitington (reporter)
2014-07-25 16:48

The unsafe_ functions are, indeed, most dangerous because they can cause corruption which eventually causes a crash distant from the call to an _unsafe function. This makes for an even worse problem than a stack overflow from a non tail-recursive Standard Library function which crashes quickly, or a floating Not_found.

Perhaps the text in the patch needs finessing, but I'd like to see it documented, if only to prevent people using Obj.magic.

it would be a nice idea to grep through the whole of OPAM for uses of Obj.magic in third-party code, and see how many are actually required, and how many could be replaced with existing functions like Char.unsafe_chr.
(0011972)
gasche (developer)
2014-08-05 19:20

While I don't think the "The program might crash" part is necessary, and I see why Damien wouldn't like it (you shouldn't start auditing uses of this function first when you investigate a crash), it is actually the case that unsafe_chr, when used in combination with other unsafe features, may crash a program by breaking the invariant that `char` are in the 0..255 range.

  let rot13_cache =
    String.init 256 (fun i -> Char.of_int ((i + 13) mod 256))

  (* performance matters! *)
  let rot13 c = Array.unsafe_get rot13_cache (Char.of_int c)

  [..]

  rot13 (Char.unsafe_chr 257)

I agree with the broad sentiment that we should document unsafe functions by stating what their (non-dynamically-checked) invariant is, instead of letting users reinvent them or use them in incorrect ways. (I'm less sure about whether they should appear in the generated ocamldoc.)

What about simply the following documentation?

  (** Convert an integer into a character, assuming it is in the range 0--255. *)

(The ocamldoc header about "unsafe functions" should be enough to imply that breaking this assumption is Very Bad.)

- Issue History
Date Modified Username Field Change
2013-08-27 16:22 johnwhitington New Issue
2013-08-28 11:05 doligez Status new => confirmed
2013-08-28 11:05 doligez Target Version => 4.01.1+dev
2013-09-03 17:18 gasche Tag Attached: junior_job
2013-12-29 16:58 pminten File Added: unsafe_chr_documentation.diff
2013-12-29 16:59 pminten Note Added: 0010769
2014-01-17 17:57 doligez Tag Attached: patch
2014-05-25 20:20 doligez Target Version 4.01.1+dev => 4.02.0+dev
2014-07-24 20:18 doligez Note Added: 0011906
2014-07-24 20:18 doligez Severity minor => feature
2014-07-24 20:18 doligez Status confirmed => feedback
2014-07-24 20:18 doligez Target Version 4.02.0+dev =>
2014-07-24 20:53 dbuenzli Note Added: 0011907
2014-07-25 16:48 johnwhitington Note Added: 0011923
2014-07-25 16:48 johnwhitington Status feedback => new
2014-08-05 19:20 gasche Note Added: 0011972


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker