New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
more efficient Digest.to_hex #6116
Comments
Comment author: @lefessan Indeed, the implementation is not as efficient as it should be. I would change your code a little, to get rid of Obj.magic first (a Digest.t is already a string), and to get the same errors than the former version: let hex n = Char.unsafe_chr (n + if n < 10 then Char.code '0' else (Char.code 'a' - 10)) let md5_hex (md5:Digest.t) = Do you think this new version is correct ? could still be optimized ? |
Comment author: @gasche Fabrice, the patch attached by ygrek has no Obj.magic, it is only the measurement code that contains a copy of the function with Obj.magic (indeed, useless in that case). I do think it's better to include the length test, in case an user would pass an invalid string (unsafe_get used unsafely is bad). A comment saying that "index out of bounds" is what the precedent implementation (in fact the String.get in ygrek, out of curiosity, in which kind of applications does this become a bottleneck? |
Comment author: @lefessan @gabriel: Indeed, I just thought the patch contained the same code. I also changed "hex n" to use Char.unsafe_chr instead if Char.chr, as it is known that the value is between 0 and 15.
Just my example: in ocp-build, for every rule to be executed, I compute a digest of what the rule would do, to check whether it has changed since the last execution, these digests are then saved in a file for the next run. For debug reason, I prefer to save them with Digest.to_hex, so that I can easily grep and lookup these digests in debug logs... |
Comment author: @ygrek My bad, I was completely sure that Digest.t is abstract and didn't bother to check, sorry. I have updated the patch, but using String.get instead of hardcoding exception string (it doesn't influence speed). Char.unsafe_chr contribution is more noticeable. Gabriel, in my case it is a bottleneck when computing hashes for strings (to create index on text columns) while generating csv to load infile into mysql database. In this case no cryptographic strength is required and actual md5 computation is fast, but outputting to hex is not, hence the patch. |
Comment author: @gasche Committed in trunk. Thanks! If you have time, would you consider adding tests in the testsuite (lib-digest) to exercise Digest.to_hex? (It could be interesting to add this to version/4.01 as well, but I'm a bit reluctant to do so, in particular in absence of tests. I'll try to ask Damien his opinion.) |
Comment author: @ygrek patch for testsuite attached, used vectors from wikipedia page and http://www.nsrl.nist.gov/testdata/ |
Comment author: @gasche Thanks! Committed. |
Original bug ID: 6116
Reporter: @ygrek
Assigned to: @gasche
Status: closed (set by @xavierleroy on 2015-12-11T18:21:27Z)
Resolution: fixed
Priority: normal
Severity: minor
Version: 4.00.1
Fixed in version: 4.01.1+dev
Category: standard library
Bug description
Digest.to_hex allocates too much. And sometimes it can be a hot spot. Compare (200K calls) :
Digest.to_hex : allocated 4.9GB, heap 0B, collect 0 362 2531, elapsed 1.525 sec
md5_hex_string : allocated 9.2MB, heap 0B, collect 0 0 4, elapsed 0.023 sec
Patch against 4.01 attached
Steps to reproduce
(*
Compile with ocamlbuild -lflags unix.cmxa md5.native
Measure module from http://ygrek.org.ua/p/code/measure.ml
*)
let hex n = Char.chr (n + if n < 10 then Char.code '0' else (Char.code 'a' - 10))
let md5_hex_string (md5:Digest.t) =
let md5 = (Obj.magic md5 : string) in
let s = String.create 32 in
for i = 0 to 15 do
let x = Char.code (String.unsafe_get md5 i) in
String.unsafe_set s (i2) (hex (x lsr 4));
String.unsafe_set s (i2+1) (hex (x land 0x0f));
done;
s
let () =
(*
let n = 1024 in
let s = String.create n in
for i = 0 to pred n do s.[i] <- Char.chr (Random.int 256) done;
*)
let s = "make it run, make it right, make it fast" in
let md5 = Digest.string s in
let run f = for i = 1 to 200_000 do f () done in
assert (Digest.to_hex md5 = md5_hex_string md5);
Measure.show " Digest.to_hex" run (fun () -> Digest.to_hex md5);
Measure.show "md5_hex_string" run (fun () -> md5_hex_string md5);
()
File attachments
The text was updated successfully, but these errors were encountered: