|Anonymous | Login | Signup for a new account||2014-08-27 19:06 CEST|
|Main | My View | View Issues | Change Log | Roadmap|
|View Issue Details|
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0004747||OCaml||OCaml standard library||public||2009-03-16 10:35||2014-08-18 20:20|
|Target Version||after-4.02.0||Fixed in Version|
|Summary||0004747: Hashtbl.resize is not tail recursive|
|Description||The function in Hashtbl.resize that copies the linked list attached to one bucket isn't tail recursive, which means that if you have the (admittedly undesirable) situation of a very long chain of buckets it can cause a stack overflow.|
let ht = Hashtbl.create 100
let () = for x = 1 to 2500000 do Hashtbl.add ht 0 () done
|Attached Files|| hashtbl.patch [^] (3,070 bytes) 2013-06-13 10:22 [Show Content]
hashtbl2.diff [^] (4,686 bytes) 2013-06-22 06:37 [Show Content]
hashtbl-benchmark.zip [^] (10,453 bytes) 2013-06-22 06:39
I encountered this bug while using Coq extraction feature, because of a misuse of Hashtbl.add instead of Hashtbl.replace that has been reported upstream.
I attached a proposed patch.
Beware that other functions build results in a non-tail-rec fashion over buckets:
Therefore, this patch may only delay stack overflows some more, as long as it can shrink buckets enough. A more thorough patch would probably incur both performance and efficiency loss.
This patch will slow down the resizing process too, since it builds the reversed bucket first.
Valentin Robert has given me a full patch. However, due to the non-critical status¹ and potential for performance regressions, there is no need to do that before the release. I'm changing the "target version" to reflect this.
¹: If the bucket becomes so long that the overflows becomes a problem, it means that there is a problem with the way hashtables are used. If it comes from hashing conflicts, you will get terrible performances due to the linked-list bucket implementation (linear cost for finding a given element), and a difference implementation must be preferred. If it comes from an over-use of the multiple binding semantics ("adding" a lot of different values on the same key), you should probably switch to a different data structure. One might argue that having a Stack Overflow is a good signal for a design defect that could otherwise go unnoticed. I'm still in favor of fixing this bug, but it can wait after the release.
|Gabriel: have you had a chance to review the patch?|
edited on: 2013-06-13 10:37
I just updated Valentin's patch; sorry, I should have done that much earlier.
I reviewed this patch and tested it for correctness. That said, we haven't done any serious measurement of the performance impact (which may be non-neglectible on small bucket lists, as the new code does one (additional) reversal when the traversal ends). I'll try to do that later this week, but in the end it will always be a judgment call on whether we favor speed on small buckets, or correctness on very large ones.
N.B.: the patch only touches the non-functorial part of Hashtbl; of course in a final commit the changes should be duplicated in the functorial part as well.
|What about starting with the "direct-style" (non-tail rec) version, decrementing a counter at each step (starting at, say, 50), and when the counter reaches 0, switching to the tail-rec (but slower) version? (Anyway, resizing is not so frequent, so it might not be worth the tiny extra complexity.)|
|I did that for List.fold_right in Batteries and the results were okay, but the implementation complexity is a bit annoying. Will try at benchmark time.|
So I did some micro-benchmarks, that were not very conclusive. There is a performance hit (that seems to disappear if you crank up the GC params (specifically "s=10M")), but hybrid methods as suggested above were not successful to bridge the gap for all changed functions.
After a bit of trial and error, I have uploaded a version of the patch that seems to bridge the gap -- at least on my x86_64 machine. I will try to get results on other machines, and re-check it for correctness (it's in a rough state now).
|2009-03-16 10:35||shinwell||New Issue|
|2009-04-29 15:41||doligez||Status||new => acknowledged|
|2012-06-20 14:31||varobert||File Added: hashtbl.patch|
|2012-06-20 14:34||varobert||Note Added: 0007590|
|2012-07-04 16:55||doligez||Priority||normal => high|
|2012-07-04 16:55||doligez||Status||acknowledged => confirmed|
|2012-07-04 17:18||doligez||Target Version||=> 4.00.0+dev|
|2012-07-04 17:33||gasche||Assigned To||=> gasche|
|2012-07-04 17:33||gasche||Status||confirmed => assigned|
|2012-07-12 14:28||gasche||Note Added: 0007730|
|2012-07-12 14:28||gasche||Target Version||4.00.0+dev => 4.01.0+dev|
|2012-07-27 12:49||frisch||Category||OCaml general => OCaml standard library|
|2012-07-31 13:36||doligez||Target Version||4.01.0+dev => 4.00.1+dev|
|2012-09-17 13:51||doligez||Target Version||4.00.1+dev => 4.00.2+dev|
|2013-06-12 17:31||frisch||Note Added: 0009466|
|2013-06-13 10:22||gasche||File Deleted: hashtbl.patch|
|2013-06-13 10:22||gasche||File Added: hashtbl.patch|
|2013-06-13 10:31||gasche||Note Added: 0009474|
|2013-06-13 10:35||frisch||Note Added: 0009475|
|2013-06-13 10:37||gasche||Note Edited: 0009474||View Revisions|
|2013-06-13 10:38||gasche||Note Added: 0009476|
|2013-06-22 06:37||gasche||File Added: hashtbl2.diff|
|2013-06-22 06:39||gasche||File Added: hashtbl-benchmark.zip|
|2013-06-22 07:23||gasche||Note Added: 0009596|
|2013-07-09 17:20||doligez||Target Version||4.00.2+dev => 4.01.0+dev|
|2013-07-22 13:01||frisch||Target Version||4.01.0+dev => 4.02.0+dev|
|2013-09-04 18:10||doligez||Tag Attached: patch|
|2014-08-18 20:20||doligez||Target Version||4.02.0+dev => after-4.02.0|
|Copyright © 2000 - 2011 MantisBT Group|