Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surprising behavior of weak hashtables and hash-consing #6941

Closed
vicuna opened this issue Jul 29, 2015 · 7 comments
Closed

Surprising behavior of weak hashtables and hash-consing #6941

vicuna opened this issue Jul 29, 2015 · 7 comments
Labels

Comments

@vicuna
Copy link

vicuna commented Jul 29, 2015

Original bug ID: 6941
Reporter: disteph
Status: closed (set by @xavierleroy on 2017-02-16T14:16:41Z)
Resolution: not a bug
Priority: normal
Severity: minor
Platform: Linux
OS: Ubuntu
OS Version: Ubuntu 14.04
Version: 4.02.3
Category: ~DO NOT USE (was: OCaml general)
Monitored by: @yakobowski

Bug description

Wrote a 5000 line SAT-solver in OCaml, some answers are wrong.
While debugging, I tweaked the print_in_fmt function that I wrote to pretty-print literals (basically, these are strings)
Changing the line
Format.fprintf fmt "0"
to
Format.fprintf fmt ""
changes the output of my SAT-solver from SAT to UNSAT (the latter being the correct answer).

I don't know how Format.fprintf is implemented and compiled, it must have some important side-effect that apparently affects the rest of my (otherwise purely functional) code.

Moreover, I get different answers if I compile in native (my problem is erroneously diagnosed as SAT) or in bytecode (my problem is correctly diagnosed as UNSAT), but the source code is the same.

I know this is probably a debugging issue of my own code, and I'm trying to investigate how a side-effect of Format.fprintf could affect the rest of my code, but with different behaviours given in native and bytecode, any hint is appreciated.

Steps to reproduce

I can try to reduce my code to something minimal, but as it will take me a lot of time, any idea is appreciated is appreciated about the different behaviours of Format.fprintf in native or bytecode.

File attachments

@vicuna
Copy link
Author

vicuna commented Jul 29, 2015

Comment author: @gasche

(1) Which OCaml version are you using?
(2) Can you reproduce the issue using Printf.fprintf instead?

@vicuna
Copy link
Author

vicuna commented Jul 29, 2015

Comment author: bvaugon

It looks like a memory corruption. Since (Printf.fprintf "0") and
(Printf.fprintf "") don't allocate the same number of bytes, it may
change the memory behaviour of your program and then the semantic of
your code.

If you are in 4.02.0 or later, the OCaml implementation of Printf no
longer uses Obj or unsafe features like that. Are you sure you are not
using, in your code, usafe things like Obj, an unsafe external,
Bytes.unsafe_set (or similar), or the -unsafe option?

To debug that and track memory corruptions, a basic way is to
frequently run (Gc.compact ()) and wait for a segfault.

@vicuna
Copy link
Author

vicuna commented Jul 29, 2015

Comment author: disteph

Ah, got it.

I tested with 4.02.2 and 4.02.3, same thing.
So, what I described as "my (otherwise purely functional) code" was not completely true: I'm hash-consing terms...
...using
module H = Weak.Make(Arg)
which I was recomended to use, otherwise the gc can never get rid of my terms which are always pointed to by the hash-consing table.
I just changed it to
module H = Hashtbl.Make(Arg)
and now all of my answers are consistent (and correct).

Is the Weak module using unsafe things like Obj? I did not know.
(But somewhere in the back of my mind, I thought something magical was happening in there, I just completely forgot about it).
Anyway, using a normal hash table is good enough for me, but presumably there is a bug somewhere that OCaml may want to get rid of. And presumably, it is in the implementation of Weak?

I'm not sure how I can track it down, though. My code barely use the Weak functions.

@vicuna
Copy link
Author

vicuna commented Jul 29, 2015

Comment author: disteph

I've attached my code.
By the way, thanks for the impressively fast answers and for efficiently directing me to the culprit piece of code. I thought I was going crazy trying to debug my code, as printing stuff on stdout was changing the behaviour of my program...

@vicuna
Copy link
Author

vicuna commented Jul 29, 2015

Comment author: bvaugon

That's strange. The Weak modules is (a priori) safe and can't break
the OCaml heap. Your attached code seems to work on my computer, even
when I switch the two commented/uncommented lines. Is it this program
which have a strange behaviour on your computer?

@vicuna
Copy link
Author

vicuna commented Jul 29, 2015

Comment author: disteph

It is part of the program that has the strange behaviour.
But I've just understood what is going on, and it is entirely my fault (but I believe it makes a very good textbook example of stupid programming):

The rest of my code was organised under the assumption that if one attempts to construct the same term twice, then the second time, the term is found in the hash-consing table and a pointer to the first construct is returned, with the same id and the property that you can check the equality with the pointers or with the ids.
The rest of my code was exploiting this assumption and, in some data-structures for sets of terms where the representation of a term as an int is useful (namely, Patricia tries), it only remembered the id and didn't even store the term (which was no longer useful information). Except that nobody was pointing to the term anymore but the hash consing table.

But obviously the assumption is not true with a hash-consing table built with Weak, since the gc can remove the entry between the first time and the second time the term is built. If that happens, the id of the term at the second construction is different from the id of the same term, now erased, at the first construction. And when my code then checked whether the term, arising from the second construction, belongs to a set (implemented as a Patricia trie), the first id was in there but not the second, so the answer was incorrectly negative.
So my SAT-solver would sometimes not detect an inconsistent model with p and not p, simply because the garbage collector had done its job between the first construction of p and the second, so it would somtimes say the problem is SAT when it was in fact UNSAT (and never the other way round).

And whether the garbage collector erased the term between its first construction and the second construction was obviously sensitive to a lot of irrelevant things, such as how many Format.fprintf I had done or whether I was compiling native code or byte code.
Thus resulting in a totally erratic, unpredictable behaviour that was driving me crazy while debugging.

My bad. I should have know better. Sorry for having wasted your time...

@vicuna
Copy link
Author

vicuna commented Nov 22, 2015

Comment author: @xavierleroy

I'm taking the liberty to mark this PR as resolved. I also updated the summary.

There are ways to implement reliable hash-consing using weak hash tables, see http://moscova.inria.fr/~doligez/publications/cuoq-doligez-mlw-2008.pdf

If you still observe weird behaviors in an implementation of hash-consing similar to that of Cuoq and Doligez, please reopen this PR.

@vicuna vicuna closed this as completed Feb 16, 2017
@vicuna vicuna added the bug label Mar 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant