Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad behaviour when hashing and comparing functions #7537

Closed
vicuna opened this issue May 23, 2017 · 3 comments
Closed

Bad behaviour when hashing and comparing functions #7537

vicuna opened this issue May 23, 2017 · 3 comments

Comments

@vicuna
Copy link

vicuna commented May 23, 2017

Original bug ID: 7537
Reporter: @stedolan
Status: resolved (set by @xavierleroy on 2017-10-15T14:35:04Z)
Resolution: not a bug
Priority: normal
Severity: minor
Category: runtime system and C interface
Monitored by: @gasche @ygrek

Bug description

Hashtbl.hash hashes the contents of closures, including their code pointers:

# Hashtbl.hash (fun x -> x);;
- : int = 710785908
# Hashtbl.hash (fun x -> x);;
- : int = 168456855

Pervasives.compare does not throw an exception on function values which are physically equal:

# let id x = x;; 
val id : 'a -> 'a = <fun>
# compare id id;;
- : int = 0

This makes hashtables act strangely:

# let h = Hashtbl.create 20;;
val h : ('_a, '_b) Hashtbl.t = <abstr>
# Hashtbl.add h id 42;;
- : unit = ()
# Hashtbl.find h id;;
- : int = 42
# Hashtbl.find h (fun x -> x);;
Exception: Not_found.
# Hashtbl.add h (fun x -> x) 10;;
- : unit = ()
# Hashtbl.find h id;;
- : int = 42
# Hashtbl.find h (fun x -> x);;
Exception: Not_found.

In particular, the result of Hashtbl.find will depend on whether the optimiser combines the different 'fun x -> x' values. The bytecode toplevel, shown here, does not.

Futher, Hashtbl.find can sometimes raise Invalid_argument, again at the mercy of the optimiser:

# let add a b = a + b;;
val add : int -> int -> int = <fun>
# let f1 = add 1;;
val f1 : int -> int = <fun>
# let f2 = add 1;;
val f2 : int -> int = <fun>
# Hashtbl.hash f1;;      
- : int = 881405119
# Hashtbl.hash f2;;
- : int = 881405119
# Hashtbl.add h f1 10;;
- : unit = ()
# Hashtbl.find h f2;;
Exception: Invalid_argument "compare: functional value".
@vicuna
Copy link
Author

vicuna commented May 23, 2017

Comment author: @stedolan

I think that Hashtbl.hash should return the same hash for all functions, without inspecting them (so as to ignore those parts of a complex data structure which are closures when computing a hash), and that Pervasives.compare should always raise when asked to compare functions, even if the functions are physically equal.

In the example above, this would cause all calls to Hashtbl.find to raise Invalid_argument.

(This issue was reported by Nicolas Ojeda Bar. Also, this comment should be part of the issue, but I don't know how to edit issues)

@vicuna
Copy link
Author

vicuna commented Jun 1, 2017

Comment author: @chambart

I don't have the same conclusion. I agree that compare should always fail, but it is perfectly ok for me for hash to behave that way. The only sensible equality on functions is physical equality, and hash f is equal to hash g when f == g.
Having hash behaving the way you suggest might turn linear code into quadratic one.

The correct way to do what you suggest is
module M = struct
type t = int -> int
let equal = (==)
let hash = Hashtbl.hash
end
module T = Hashtbl.Make(M)

@vicuna
Copy link
Author

vicuna commented Oct 9, 2017

Comment author: @xavierleroy

Pervasives.compare should always raise when asked to compare functions, even if the functions are physically equal

This is the "NaN comparison problem" all over again. The functions may be deep inside a data structure. With your requirements it becomes impossible to ever shortcut comparisons ("if x == y then compare x y = 0"). We've already sacrificed shortcutting "=" tests in the name of NaNs; we're not going to sacrifice Pervasives.compare tests in the name of function closures.

The sensible thing to do is the custom hash table shown in @chambart's reply.

I move to resolve/no change this PR as I don't see anything we could do here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant