Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Char.unsafe_chr #6141

Closed
vicuna opened this issue Aug 27, 2013 · 6 comments
Closed

Document Char.unsafe_chr #6141

vicuna opened this issue Aug 27, 2013 · 6 comments

Comments

@vicuna
Copy link

vicuna commented Aug 27, 2013

Original bug ID: 6141
Reporter: @johnwhitington
Status: feedback (set by @damiendoligez on 2014-11-24T17:05:49Z)
Resolution: open
Priority: normal
Severity: feature
Category: documentation
Tags: patch, junior_job
Monitored by: @gasche @dbuenzli

Bug description

Rationale:

  1. Char.unsafe_chr is widely used in third-party low level libraries which are not part of stdlib.

  2. People who don't know Char.unsafe_chr exists replicate it using Obj.magic (this is what I used to do) This is bad.

  3. It's no more unsafe that other unsafe_ functions in the stdlib which are documented.

As an example, in very fast lexing code, where we know an integer is actually a byte in the range 0...255, Char.chr can be quite a cost.

File attachments

@vicuna
Copy link
Author

vicuna commented Dec 29, 2013

Comment author: pminten

Here's a patch to add a bit of documentation to Char.unsafe_chr.

@vicuna
Copy link
Author

vicuna commented Jul 24, 2014

Comment author: @damiendoligez

"the program might crash" is bad (and I don't believe it's true with the current implementation). We're trying to move away from unsafe functions and anything that can cause a crash if misused: for example we sometimes daydream about removing Array.unsafe_get and unsafe_set.

So I'm a bit unsatisfied with this version.

@vicuna
Copy link
Author

vicuna commented Jul 24, 2014

Comment author: @dbuenzli

I understand the philosophical point of view but when working on encoders/decoders it's still useful to be able to bypass the array bound checking on each access since you usually already checked sizes just before accessing a chunk of bytes and if I (once) measured correctly the performance improvements in doing so are non-negligible. So usually that's what I do:

https://github.com/dbuenzli/uutf/blob/master/src/uutf.ml#L14
https://github.com/dbuenzli/jsonm/blob/master/src/jsonm.ml#L12
https://github.com/dbuenzli/dicomm/blob/master/src/dicomm.ml#L23
https://github.com/dbuenzli/otfm/blob/master/src/otfm.ml#L11
etc.

But that's not really satisfactory, as you need to change the source code to make the module safe. Now we have the -unsafe option that turns all safe accesses into unsafe ones, which may be a little bit to coarse grained (not to mention people actually wrongly relying on the Invalid_argument exception raised by out of bound accesses). I would argue that we actually want the converse, i.e. a -safe option that turns unsafe accesses into safe ones. Do you think such a scheme could be implemented in the compiler ?

@vicuna
Copy link
Author

vicuna commented Jul 25, 2014

Comment author: @johnwhitington

The unsafe_ functions are, indeed, most dangerous because they can cause corruption which eventually causes a crash distant from the call to an _unsafe function. This makes for an even worse problem than a stack overflow from a non tail-recursive Standard Library function which crashes quickly, or a floating Not_found.

Perhaps the text in the patch needs finessing, but I'd like to see it documented, if only to prevent people using Obj.magic.

it would be a nice idea to grep through the whole of OPAM for uses of Obj.magic in third-party code, and see how many are actually required, and how many could be replaced with existing functions like Char.unsafe_chr.

@vicuna
Copy link
Author

vicuna commented Aug 5, 2014

Comment author: @gasche

While I don't think the "The program might crash" part is necessary, and I see why Damien wouldn't like it (you shouldn't start auditing uses of this function first when you investigate a crash), it is actually the case that unsafe_chr, when used in combination with other unsafe features, may crash a program by breaking the invariant that char are in the 0..255 range.

let rot13_cache =
String.init 256 (fun i -> Char.of_int ((i + 13) mod 256))

(* performance matters! *)
let rot13 c = Array.unsafe_get rot13_cache (Char.of_int c)

[..]

rot13 (Char.unsafe_chr 257)

I agree with the broad sentiment that we should document unsafe functions by stating what their (non-dynamically-checked) invariant is, instead of letting users reinvent them or use them in incorrect ways. (I'm less sure about whether they should appear in the generated ocamldoc.)

What about simply the following documentation?

(** Convert an integer into a character, assuming it is in the range 0--255. *)

(The ocamldoc header about "unsafe functions" should be enough to imply that breaking this assumption is Very Bad.)

@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant