Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0006116OCamlstandard librarypublic2013-08-04 07:572015-12-11 19:21
Assigned Togasche 
PlatformOSOS Version
Product Version4.00.1 
Target VersionFixed in Version4.01.1+dev 
Summary0006116: more efficient Digest.to_hex
DescriptionDigest.to_hex allocates too much. And sometimes it can be a hot spot. Compare (200K calls) :

 Digest.to_hex : allocated 4.9GB, heap 0B, collect 0 362 2531, elapsed 1.525 sec
md5_hex_string : allocated 9.2MB, heap 0B, collect 0 0 4, elapsed 0.023 sec

Patch against 4.01 attached
Steps To Reproduce(*
Compile with ocamlbuild -lflags unix.cmxa md5.native
Measure module from [^]

let hex n = Char.chr (n + if n < 10 then Char.code '0' else (Char.code 'a' - 10))

let md5_hex_string (md5:Digest.t) =
  let md5 = (Obj.magic md5 : string) in
  let s = String.create 32 in
  for i = 0 to 15 do
    let x = Char.code (String.unsafe_get md5 i) in
    String.unsafe_set s (i*2) (hex (x lsr 4));
    String.unsafe_set s (i*2+1) (hex (x land 0x0f));

let () =
  let n = 1024 in
  let s = String.create n in
  for i = 0 to pred n do s.[i] <- Char.chr ( 256) done;
  let s = "make it run, make it right, make it fast" in
  let md5 = Digest.string s in
  let run f = for i = 1 to 200_000 do f () done in
  assert (Digest.to_hex md5 = md5_hex_string md5); " Digest.to_hex" run (fun () -> Digest.to_hex md5); "md5_hex_string" run (fun () -> md5_hex_string md5);
TagsNo tags attached.
Attached Filesdiff file icon digest_to_hex_speedup.diff [^] (644 bytes) 2013-08-04 07:57 [Show Content]
diff file icon digest_to_hex_speedup2.diff [^] (633 bytes) 2013-08-04 17:39 [Show Content]
diff file icon digest_to_hex_test.diff [^] (1,678 bytes) 2013-08-05 05:28 [Show Content]

- Relationships

-  Notes
lefessan (developer)
2013-08-04 11:49

Indeed, the implementation is not as efficient as it should be. I would change your code a little, to get rid of Obj.magic first (a Digest.t is already a string), and to get the same errors than the former version:

let hex n = Char.unsafe_chr (n + if n < 10 then Char.code '0' else (Char.code 'a' - 10))

let md5_hex (md5:Digest.t) =
  if String.length md5 < 16 then invalid_arg "index out of bounds";
  let s = String.create 32 in
  for i = 0 to 15 do
    let x = Char.code (String.unsafe_get md5 i) in
    String.unsafe_set s (i*2) (hex (x lsr 4));
    String.unsafe_set s (i*2+1) (hex (x land 0x0f));

Do you think this new version is correct ? could still be optimized ?
gasche (developer)
2013-08-04 14:08

Fabrice, the patch attached by ygrek has no Obj.magic, it is only the measurement code that contains a copy of the function with Obj.magic (indeed, useless in that case).

I do think it's better to include the length test, in case an user would pass an invalid string (unsafe_get used unsafely is bad). A comment saying that "index out of bounds" is what the precedent implementation (in fact the String.get in `d.[i]`) raised and was preserved for backward-compatibility would be helpful.

ygrek, out of curiosity, in which kind of applications does this become a bottleneck?
lefessan (developer)
2013-08-04 15:27

@Gabriel: Indeed, I just thought the patch contained the same code. I also changed "hex n" to use Char.unsafe_chr instead if Char.chr, as it is known that the value is between 0 and 15.

> ygrek, out of curiosity, in which kind of applications does this become a bottleneck?

Just my example: in ocp-build, for every rule to be executed, I compute a digest of what the rule would do, to check whether it has changed since the last execution, these digests are then saved in a file for the next run. For debug reason, I prefer to save them with Digest.to_hex, so that I can easily grep and lookup these digests in debug logs...
ygrek (reporter)
2013-08-04 17:38

My bad, I was completely sure that Digest.t is abstract and didn't bother to check, sorry. I have updated the patch, but using String.get instead of hardcoding exception string (it doesn't influence speed). Char.unsafe_chr contribution is more noticeable.

Gabriel, in my case it is a bottleneck when computing hashes for strings (to create index on text columns) while generating csv to load infile into mysql database. In this case no cryptographic strength is required and actual md5 computation is fast, but outputting to hex is not, hence the patch.
gasche (developer)
2013-08-04 22:04

Committed in trunk. Thanks!

If you have time, would you consider adding tests in the testsuite (lib-digest) to exercise Digest.to_hex?

(It could be interesting to add this to version/4.01 as well, but I'm a bit reluctant to do so, in particular in absence of tests. I'll try to ask Damien his opinion.)
ygrek (reporter)
2013-08-05 05:29

patch for testsuite attached, used vectors from wikipedia page and [^]
gasche (developer)
2013-08-05 11:43

Thanks! Committed.

- Issue History
Date Modified Username Field Change
2013-08-04 07:57 ygrek New Issue
2013-08-04 07:57 ygrek File Added: digest_to_hex_speedup.diff
2013-08-04 11:49 lefessan Note Added: 0010088
2013-08-04 11:49 lefessan Status new => feedback
2013-08-04 14:08 gasche Note Added: 0010089
2013-08-04 15:27 lefessan Note Added: 0010092
2013-08-04 17:38 ygrek Note Added: 0010093
2013-08-04 17:38 ygrek Status feedback => new
2013-08-04 17:39 ygrek File Added: digest_to_hex_speedup2.diff
2013-08-04 20:22 gasche Assigned To => gasche
2013-08-04 20:22 gasche Status new => assigned
2013-08-04 22:04 gasche Note Added: 0010098
2013-08-04 22:04 gasche Status assigned => resolved
2013-08-04 22:04 gasche Fixed in Version => 4.01.1+dev
2013-08-04 22:04 gasche Resolution open => fixed
2013-08-05 05:28 ygrek File Added: digest_to_hex_test.diff
2013-08-05 05:29 ygrek Note Added: 0010101
2013-08-05 11:43 gasche Note Added: 0010110
2015-12-11 19:21 xleroy Status resolved => closed
2017-02-23 16:43 doligez Category OCaml standard library => standard library

Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker