Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a find_add function in Hashtbl (weak and not weak) #7644

Closed
vicuna opened this issue Sep 28, 2017 · 5 comments
Closed

add a find_add function in Hashtbl (weak and not weak) #7644

vicuna opened this issue Sep 28, 2017 · 5 comments

Comments

@vicuna
Copy link

vicuna commented Sep 28, 2017

Original bug ID: 7644
Reporter: ChriChri
Status: acknowledged (set by @xavierleroy on 2017-09-29T18:03:11Z)
Resolution: open
Priority: normal
Severity: feature
Version: 4.05.0
Category: standard library
Monitored by: ChriChri

Bug description

When using hash on medium or large structure,
very often the time spend in hash (reported by profiling)
is between 5% and 15% of the total time (from gprof of perf).

Often this program use a pattern like (common for memoization):

try
  Hashtbl.find tbl key
with
  Not_found ->
    let v = code__for__v in
    Hashtbl.add tbl key v;
    v

I propose to add a function that replaces the above by

Hashtbl.find_add tbl key (fun () -> code_for_v)

This both simplifies the code and divides by 2 the number of call to hash.

Similar functions could be added for weak hashtbl and Map library.

Additional information

If people agree that it is worth it, I will propose a pull request.

@vicuna
Copy link
Author

vicuna commented Sep 29, 2017

Comment author: @xavierleroy

This is a common usage pattern for hash tables, indeed. Just make sure that the "find_add" function you propose increases performance noticeably. I would expect some performance increase for hashtables but none for purely-functional maps.

@vicuna
Copy link
Author

vicuna commented Oct 1, 2017

Comment author: ChriChri

Why no increase for purely functional map ?

I think you divide by two the number of compare,
because you descend the tree only once for find_add (surely you gain nothing
if the key is there).

Anyway, I will propose it for hash tables first.

@vicuna
Copy link
Author

vicuna commented Oct 1, 2017

Comment author: ChriChri

I just did a PR with update for Hashtbl and Ephemeron as in Map. This only increase speed when

  • the key are really large and add in frequent (avoid two computing of hashes)
  • or remove is frequent (avoid two hash and two traversals of buckets)

On the test I did, find_add (which I implemented too) is faster because
it avoids the closure/option. I will propose it in a second PR after this one goes through (if it does)

@yallop
Copy link
Member

yallop commented Apr 11, 2019

Note: @craff's PR implementing the update function is here: #1389

@github-actions
Copy link

github-actions bot commented May 7, 2020

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label May 7, 2020
@github-actions github-actions bot closed this as completed Jun 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants