Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007279OCamlstandard librarypublic2016-06-27 13:312017-02-23 16:56
Reportersawfish 
Assigned To 
PrioritynormalSeveritymajorReproducibilityalways
StatusresolvedResolutionfixed 
PlatformOSOS Version
Product Version4.03.0 
Target Version4.05.0 +dev/beta1/beta2/beta3/rc1Fixed in Version4.05.0 +dev/beta1/beta2/beta3/rc1 
Summary0007279: Custom blocks are finalized more than once if stored in OCaml's weak sets (probably related to Weak.get_copy)
DescriptionIf custom blocks are stored in weak sets obtained by Weak.Make, their finalizers are sometimes called more than once.

There has been a discussion on the OCaml mailing list regarding this issue:
https://sympa.inria.fr/sympa/arc/caml-list/2016-06/msg00070.html [^]

As a bottom line, the assumption was that the behavior may be related to shallow copies that are created by Weak.get_copy during lookup operations.

The bug is reproducible since OCaml 4.02.0 but does not occur in earlier versions. As an exception, the first beta release of OCaml 4.03.0 does not exhibit the bug, whereas it reproduces in the final 4.03.0 release.
Steps To ReproduceA github repository with a minimal example that reproduces the error is available at https://github.com/martin-neuhaeusser/ocaml_bug/ [^]
TagsNo tags attached.
Attached Files

- Relationships

-  Notes
(0016008)
bobot (reporter)
2016-06-27 16:23

The fact that Weak.Make uses Weak.get_copy is certainly the culprit.

For the fact that 0Caml 4.03.0+beta1 didn't have the bug but 4.03.0 does, is not really determining since 4.03.0+beta1 add a big bug during weak hashset resizing 8204f5d1a7.
(0016062)
frisch (developer)
2016-07-13 09:49

Copying custom blocks sounds rather scary indeed. What about changing Weak.get_copy implementation (and documentation) so that it returns the original value if it is a custom block. Is there a better solution?
(0016078)
sawfish (reporter)
2016-07-18 14:44

If Weak.get_copy just returned the original value if it is a custom block would make it rather useless in my opinion, as this makes the value live.

I am working on some OCaml bindings for an SMT solver and use a weak hashtable to lookup the solver's native terms which are stored in a custom block. This implements a restricted hash-consing that avoids having multiple alive custom blocks around that refer to the same native term.

My experiments suggest that Weak.get_copy works correctly (also when storing custom blocks) in OCaml versions prior to 4.02.0. So I still hope that there is a bug in handling the weak copies that could be fixed.
(0016079)
frisch (developer)
2016-07-18 14:50

> My experiments suggest that Weak.get_copy works correctly (also when storing custom blocks) in OCaml versions prior to 4.02.0.

I think it is just that until recently, custom blocks with finalizers where normally always allocated directly in the major heap, and (to be confirmed), get_copy would instead allocate them in the minor heap, and their finalizer would not be run. Could that be the case? Of course, this is not a proper solution (and it's actually very unsafe).

> would make it rather useless in my opinion, as this makes the value live.

It makes the values that fall in the same slot live for a bit longer. Whether it leads to increased memory use or not depends on the allocation profile of your application (and the size of the weak table, and the hashing function, etc), so unless you have tried it, I would be cautious before saying it renders the Weak table "useless".

> This implements a restricted hash-consing that avoids having multiple alive custom blocks around that refer to the same native term.

This property would not be lost, of course.
(0016094)
bobot (reporter)
2016-07-19 23:10

> I think it is just that until recently, custom blocks with finalizers where normally always allocated directly in the major heap, and (to be confirmed), get_copy would instead allocate them in the minor heap, and their finalizer would not be run. Could that be the case? Of course, this is not a proper solution (and it's actually very unsafe).

By reading the code, it seems that only custom block allocated with `caml_alloc_custom` are finalized when in the minor heap. And `get_copy` allocates with `caml_alloc`. So I think nothing change on that matter.
(0016098)
frisch (developer)
2016-07-20 09:19

> So I think nothing change on that matter.

Ok, do we all agree that this behavior was already wrong? get_copy can happily create a fresh copy of a custom block, which the client code can keep around while the original custom block is finalized perhaps releasing associated native resources. But the client code can still use the copy later and try to access those resources.

get_copy on custom block is really too risky: it breaks the invariant that custom blocks are only allocated explicitly by the user (or through the unmarshaling function using a function provided by the user).
(0016100)
frisch (developer)
2016-07-20 14:38

POC:

open Bigarray
open Bigarray.Array1

let () =
  let a = create float64 c_layout 10 in
  Gc.compact ();
  set a 0 42.;

  let w = Weak.create 1 in
  Weak.set w 0 (Some a);

  let b =
    match Weak.get_copy w 0 with
    | None -> assert false
    | Some b -> b
  in
  Printf.printf "a.(0) = %f\n" (get a 0);
  Printf.printf "b.(0) = %f\n" (get b 0);
  Gc.compact ();
  assert(Weak.get_copy w 0 = None);


  let c = create float64 c_layout 10 in
  Printf.printf "b.(0) = %f\n" (get b 0);
  set c 0 123.;
  Printf.printf "b.(0) = %f\n" (get b 0);


prints on my machine:

a.(0) = 42.000000
b.(0) = 42.000000
b.(0) = 0.000000
b.(0) = 123.000000


The array malloced for [c] reuses the space for [a] (the OCaml value has been collected, thus freeing the underlying data). This can easily lead to segfaults.
(0016107)
bobot (reporter)
2016-07-20 23:30

> get_copy on custom block is really too risky

I agree. I'm going to create a merge request for removing that. But should we deprecate `get_copy` and use `get` instead? I will check on Frama-C if that has a noticeable impact. Alain since you are a great user of hash-consing could you also check? I can prepare a second branch with this change.


sawfish: The problem appears already before 4.02.0, take your gcbug.ml and replace

```
diff --git a/gcbug.ml b/gcbug.ml
index c985f3f..f82e901 100644
--- a/gcbug.ml
+++ b/gcbug.ml
@@ -21,10 +21,7 @@ let create_dummy =
   let dummy_set = WeakDummySet.create 17 in
   fun num ->
     let new_dummy = wrapper_create_dummy num in
- try WeakDummySet.find dummy_set new_dummy with
- | Not_found ->
- WeakDummySet.add dummy_set new_dummy;
- new_dummy
+ WeakDummySet.merge dummy_set new_dummy
 
 (* Create a list of cnt random dummy instances *)
 let rec create_dummies cnt accu =
```

which is a more efficient version of the code. It fails with 4.01.0
(0016108)
frisch (developer)
2016-07-21 09:24

I'd argue to keep get_copy but change its implementation and doc so that it returns the original value when it is a custom block. This would only affect people doing weak tables on custom blocks, which should be rather rare (and more or less already broken anyway).

As for using get instead of get_copy in Weak.Make, this is another question. As I reported on caml-list, LexiFi now has a local variant of Weak.Make that uses get instead of get_copy; we use it for our core contract algebra system which is performance critical and this seemed to give good results. There can of course be cases where this would increase the memory usage too much; this depends on so many factors that it is difficult to give a general recommendation. For us it worked well.
(0016109)
sawfish (reporter)
2016-07-21 13:05

@frisch: The term "useless" is too harsh, of course.

After spending some more thoughts, I agree with Alain. Two observations:

1. In the software model checker that we are implementing, we have a stand-alone implementation for hash-consing which works similar to the weak sets implemented by the Weak module: It stores the bucket entries' hashes in a usual array whereas the actual entries are stored in a Weak array. During a search, we first look for identical hashes; once a match is found, we skip the "Weak.get_copy" step and directly obtain the value from the weak array using Weak.get. In our experiments, this performs slightly better than the additional indirection via Weak.get_copy. I guess that if the underlying hash function is good, the probability of a hash collision within a bucket is low enough to take the risk of unneccessarily making an unreachable value alive.
This observation seems to be similar to @frisch's last remark.

2. In case of hash-consing for terms, I think that Weak.get_copy is of limited value: If a shallow copy of an otherwise unreachable term is obtained during a search within a bucket, Weak.get_copy leaves the term itself unreachable but makes all its subterms alive. This has already been observed by Pascal Cuoq and Damien Doligez: "Hashconsing in an incrementally garbage-collected system", Section 7.2 (ML Workshop 2008).

These two observations make me believe that Weak.get_copy could be avoided entirely without impacting performance. But as there is no strong evidence for that claim (yet), I also suggest to keep Weak.get_copy and to modify it such that it returns the actual value in case that value is a custom block.
(0016110)
frisch (developer)
2016-07-21 13:15

Thanks @sawfish. This also confirms that one should perhaps let users choose whether get or get_copy should be used in Weak.Make.
(0016116)
bobot (reporter)
2016-07-22 23:22

https://github.com/ocaml/ocaml/pull/710 [^] for "don't copy custom block". Alain your POC is integrated in the testsuite.
(0016117)
bobot (reporter)
2016-07-22 23:23

I created http://caml.inria.fr/mantis/view.php?id=7303 [^] for discussing a full (utopic?) solution.
(0016250)
frisch (developer)
2016-08-30 11:04

Bumped target version. This is too late for 4.04 (and it's not a regression).
(0017393)
xleroy (administrator)
2017-02-22 15:20

Pinging @bobot to see if GPR#710 is ready for merging.
(0017413)
doligez (administrator)
2017-02-23 16:56

Fixed by GPR#710

trunk: 2b8836cef389b0d77bd8be5d57c2c2b49d71377b
4.05: 61037b1cd485868b5bf02fa464586669fc1e37f1

- Issue History
Date Modified Username Field Change
2016-06-27 13:31 sawfish New Issue
2016-06-27 15:28 gasche Status new => acknowledged
2016-06-27 15:28 gasche Target Version => 4.04.0 +dev / +beta1 / +beta2
2016-06-27 16:23 bobot Note Added: 0016008
2016-07-13 09:49 frisch Note Added: 0016062
2016-07-18 14:44 sawfish Note Added: 0016078
2016-07-18 14:50 frisch Note Added: 0016079
2016-07-19 23:10 bobot Note Added: 0016094
2016-07-20 09:19 frisch Note Added: 0016098
2016-07-20 14:38 frisch Note Added: 0016100
2016-07-20 23:30 bobot Note Added: 0016107
2016-07-21 09:24 frisch Note Added: 0016108
2016-07-21 13:05 sawfish Note Added: 0016109
2016-07-21 13:15 frisch Note Added: 0016110
2016-07-22 23:22 bobot Note Added: 0016116
2016-07-22 23:23 bobot Note Added: 0016117
2016-08-30 11:03 frisch Target Version 4.04.0 +dev / +beta1 / +beta2 => 4.05.0 +dev/beta1/beta2/beta3/rc1
2016-08-30 11:04 frisch Note Added: 0016250
2017-02-22 15:20 xleroy Note Added: 0017393
2017-02-23 16:36 doligez Category OCaml general => -OCaml general
2017-02-23 16:56 doligez Note Added: 0017413
2017-02-23 16:56 doligez Status acknowledged => resolved
2017-02-23 16:56 doligez Resolution open => fixed
2017-02-23 16:56 doligez Category -OCaml general => standard library
2017-02-23 16:56 doligez Fixed in Version => 4.05.0 +dev/beta1/beta2/beta3/rc1


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker