Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RE: Possible bug in GC #8393

Closed
vicuna opened this issue Dec 1, 2003 · 9 comments
Closed

RE: Possible bug in GC #8393

vicuna opened this issue Dec 1, 2003 · 9 comments
Labels

Comments

@vicuna
Copy link

vicuna commented Dec 1, 2003

Original bug ID: 1954
Reporter: administrator
Status: closed
Resolution: fixed
Priority: normal
Severity: minor
Category: ~DO NOT USE (was: OCaml general)

Bug description

Hi. I found another case where this happened even with "-linkall"....so
I now don't have an effective workaround.

A little bit of stats for you. We run our o'caml based model checker
on about 900 real-world benchmarks that require a couple of days to run
on about 4 machines. On those benchmarks this occurs about 3 times
when running on the 900 benchmarks.

On another set of small C programs (about 200 cases, that require about
an hour to run) that we're running on it happens on one benchmark.

The problem w/ this bug is that any change in the code can cause it to
go away on one benchmark but often appear in another place. Its really
quite hard to predict ;-)

  • Byron

-----Original Message-----
From: Byron Cook
Sent: Wednesday, November 26, 2003 12:49 AM
To: 'caml-bugs@inria.fr'
Subject: Possible bug in

Hi,

I'm seeing what I think might be a bug in ocamlopt or the
ocaml runtime. I'm using "The Objective Caml native-code
compiler, version 3.04, Standard library directory:
c:\ocaml\lib" on Windows XP.

My code is setting an alarm in the GC system with
Gc.create_alarm. I then see that (for a very rare and
seemingly unrelated set of input files to this program
(SLAM), on the first garbage collection, the program core
dumps. this happens even if the function that is being
passed to Gc.create_alarm simply prints a string.

There seem to be some ways to work around the problem. Of
course if I dont use Gc.create_alarm at all the problem goes
away. ;-) Also, if I pass -linkall to the ocamlopt it seems
to go away on the particular case that I'm looking at.
However: its hard to know if this is a real workaround or if
its just working in this one case where i'm seeing the problem.

Have you all seen this problem before? If so: what's
happening? If not: is there some secret logging command that
I could use to create a dump for you all to look at (if
you're interested...)

Thanks

  • Byron
@vicuna
Copy link
Author

vicuna commented Dec 2, 2003

Comment author: administrator

Thanks for your reply. see comments below.

From: Damien Doligez [mailto:caml-bugs@pauillac.inria.fr]
Sent: Tuesday, December 02, 2003 10:07 AM
To: Byron Cook
Subject: RE: Possible bug in GC (#8393)

A little bit of stats for you. We run our o'caml based
model checker
on about 900 real-world benchmarks that require a couple of
days to run
on about 4 machines. On those benchmarks this occurs about 3 times
when running on the 900 benchmarks.

I'm really impressed that you can do statistics on such an
obscure bug...
You must be using O'Caml quite a lot.

Oh, yes, we're running o'caml compiled code pretty much day and night on
many machines.

I'm seeing what I think might be a bug in ocamlopt or the ocaml
runtime. I'm using "The Objective Caml native-code
compiler, version
3.04, Standard library directory:
c:\ocaml\lib" on Windows XP.

I'm pretty sure it is a known bug in 3.04. I think it was
there since 3.00.
It was fixed for 3.05.

If you can upgrade to 3.07, the bug should go away.
Otherwise, if you can completely avoid using Gc.finalise (and
Gc.create_alarm), you should be safe. If neither solution is
acceptable, I can try to cook up a patch for 3.04.

Anyway, thanks for your bug report. Let me know which
solution you choose, and more important, whether it works :-)

Thanks. I think that i'll try the Gc.finalize approach first. Then, no
matter what the result is, I think that I'll try upgrading to 3.07 and
see how that goes. I'll let you know. By the way: I'll be at Inria-Roc.
on the 27th and 28th.

  • Byron

@vicuna
Copy link
Author

vicuna commented Dec 2, 2003

Comment author: administrator

Oh. I'm sorry. I misread your mail. The only solution that allows me
to use Gc.create_alarm is to upgrade. I though that I saw another
suggestion in this mail. So: I'll try upgrading and see what happens.

-----Original Message-----
From: Damien Doligez [mailto:caml-bugs@pauillac.inria.fr]
Sent: Tuesday, December 02, 2003 10:07 AM
To: Byron Cook
Subject: RE: Possible bug in GC (#8393)

A little bit of stats for you. We run our o'caml based
model checker
on about 900 real-world benchmarks that require a couple of
days to run
on about 4 machines. On those benchmarks this occurs about 3 times
when running on the 900 benchmarks.

I'm really impressed that you can do statistics on such an
obscure bug...
You must be using O'Caml quite a lot.

I'm seeing what I think might be a bug in ocamlopt or the ocaml
runtime. I'm using "The Objective Caml native-code
compiler, version
3.04, Standard library directory:
c:\ocaml\lib" on Windows XP.

I'm pretty sure it is a known bug in 3.04. I think it was
there since 3.00.
It was fixed for 3.05.

If you can upgrade to 3.07, the bug should go away.
Otherwise, if you can completely avoid using Gc.finalise (and
Gc.create_alarm), you should be safe. If neither solution is
acceptable, I can try to cook up a patch for 3.04.

Anyway, thanks for your bug report. Let me know which
solution you choose, and more important, whether it works :-)

-- Damien

@vicuna
Copy link
Author

vicuna commented Dec 2, 2003

Comment author: administrator

A little bit of stats for you. We run our o'caml based model checker
on about 900 real-world benchmarks that require a couple of days to run
on about 4 machines. On those benchmarks this occurs about 3 times
when running on the 900 benchmarks.

I'm really impressed that you can do statistics on such an obscure bug...
You must be using O'Caml quite a lot.

I'm seeing what I think might be a bug in ocamlopt or the
ocaml runtime. I'm using "The Objective Caml native-code
compiler, version 3.04, Standard library directory:
c:\ocaml\lib" on Windows XP.

I'm pretty sure it is a known bug in 3.04. I think it was there since 3.00.
It was fixed for 3.05.

If you can upgrade to 3.07, the bug should go away. Otherwise, if you
can completely avoid using Gc.finalise (and Gc.create_alarm), you should
be safe. If neither solution is acceptable, I can try to cook up a patch
for 3.04.

Anyway, thanks for your bug report. Let me know which solution you
choose, and more important, whether it works :-)

-- Damien

@vicuna
Copy link
Author

vicuna commented Dec 3, 2003

Comment author: administrator

I have a question for you.

I upgraded the compiler from 3.04 to 3.07 and I've run into a problem.
The problem is that I'm depending on Ocaml's Map module to keep its
values in a canonical form. In a fairly "inner loop" in my automatic
theorem prover I use "n1 <> m2" as my termination condition where m1 and
m2 are both elements of a type constructued with Map.

This invariant appeared to be true in 3.04, but I'm seeing some cases
where its not true in 3.07.

Question: is this known? Or is this a 3.07 bug?

Thanks!

  • Byron

-----Original Message-----
From: Damien Doligez [mailto:caml-bugs@pauillac.inria.fr]
Sent: Tuesday, December 02, 2003 10:07 AM
To: Byron Cook
Subject: RE: Possible bug in GC (#8393)

A little bit of stats for you. We run our o'caml based
model checker
on about 900 real-world benchmarks that require a couple of
days to run
on about 4 machines. On those benchmarks this occurs about 3 times
when running on the 900 benchmarks.

I'm really impressed that you can do statistics on such an
obscure bug...
You must be using O'Caml quite a lot.

I'm seeing what I think might be a bug in ocamlopt or the ocaml
runtime. I'm using "The Objective Caml native-code
compiler, version
3.04, Standard library directory:
c:\ocaml\lib" on Windows XP.

I'm pretty sure it is a known bug in 3.04. I think it was
there since 3.00.
It was fixed for 3.05.

If you can upgrade to 3.07, the bug should go away.
Otherwise, if you can completely avoid using Gc.finalise (and
Gc.create_alarm), you should be safe. If neither solution is
acceptable, I can try to cook up a patch for 3.04.

Anyway, thanks for your bug report. Let me know which
solution you choose, and more important, whether it works :-)

-- Damien

@vicuna
Copy link
Author

vicuna commented Dec 3, 2003

Comment author: administrator

I'm sorry. I was wrong. The problem is not in Map. Its in Set.

The problem is here.

Imagine that we have a module defined as:

module ESet =
Set.Make(struct type t = E.expr
let compare = (compare : E.expr -> E.expr -> int)
end
)

And this little code snippit:

let diffb_1 = ESet.diff s1b s2b in
let is_diffb_1 = diffb_1 <> ESet.empty in

let diffb_2 = ESet.diff s2b s1b in
let is_diffb_2 = diffb_2 <> ESet.empty in

let diffb = is_diffb_1 || is_diffb_2 in
let diffb' = s1b <> s2b in
assert (diffb = diffb');

I'm seeing cases where this "assert" fails.

Is this a known problem?

-----Original Message-----
From: Byron Cook
Sent: Wednesday, December 03, 2003 4:42 PM
To: 'Damien Doligez'
Subject: RE: Possible bug in GC (#8393)

I have a question for you.

I upgraded the compiler from 3.04 to 3.07 and I've run into a
problem. The problem is that I'm depending on Ocaml's Map
module to keep its values in a canonical form. In a fairly
"inner loop" in my automatic theorem prover I use "n1 <> m2"
as my termination condition where m1 and m2 are both elements
of a type constructued with Map.

This invariant appeared to be true in 3.04, but I'm seeing
some cases where its not true in 3.07.

Question: is this known? Or is this a 3.07 bug?

Thanks!

  • Byron

-----Original Message-----
From: Damien Doligez [mailto:caml-bugs@pauillac.inria.fr]
Sent: Tuesday, December 02, 2003 10:07 AM
To: Byron Cook
Subject: RE: Possible bug in GC (#8393)

A little bit of stats for you. We run our o'caml based
model checker
on about 900 real-world benchmarks that require a couple of
days to run
on about 4 machines. On those benchmarks this occurs
about 3 times
when running on the 900 benchmarks.

I'm really impressed that you can do statistics on such an obscure
bug...
You must be using O'Caml quite a lot.

I'm seeing what I think might be a bug in ocamlopt or the ocaml
runtime. I'm using "The Objective Caml native-code
compiler, version
3.04, Standard library directory:
c:\ocaml\lib" on Windows XP.

I'm pretty sure it is a known bug in 3.04. I think it was
there since
3.00.
It was fixed for 3.05.

If you can upgrade to 3.07, the bug should go away.
Otherwise, if you can completely avoid using Gc.finalise (and
Gc.create_alarm), you should be safe. If neither solution is
acceptable, I can try to cook up a patch for 3.04.

Anyway, thanks for your bug report. Let me know which solution you
choose, and more important, whether it works :-)

-- Damien

@vicuna
Copy link
Author

vicuna commented Dec 4, 2003

Comment author: administrator

Thanks!

  • Byron

-----Original Message-----
From: Xavier Leroy [mailto:xavier.leroy@inria.fr]
Sent: Thursday, December 04, 2003 8:21 AM
To: Byron Cook
Cc: Caml bug tracking system
Subject: Re: Possible bug in GC (#8393)

I'm sorry. I was wrong. The problem is not in Map. Its in Set.
[...]

Both maps and sets cannot reliably be compared with the
generic equality primitive, because equivalent sets or maps
may have different representations as AVL trees. Consider
for instance the set {a;b} with a < b. Two possible
representations are

    b               and             a
   /                                 \
  a                                   b

and these are structurally different.

For sets, an easy solution is to use the "equal" function
provided in the Set.S signature instead of =. There is no
"equal" function provided for maps, and this is unfortunate
(I ran into this problem very recently!). A rather
inefficient "equal" function can be coded from the "iter" and
"get" functions. (I can send you the code if you're interested.)

One possible reason the problem shows up with 3.07 and not
with 3.04 is that there was a performance bug in sets and
maps that was fixed in 3.07. The bug in question had to do
with certain operations (such as union and intersection)
producing trees that were not properly balanced according to
the AVL criterion, thus breaking the guarantee that lookup
functions run in log n time. So, 3.07 rebalances more
aggressively in some circumstances, and this could cause a
structural equality that held before "by accident" to no longer hold.

In the unlikely case that your set elements or map keys are
integers, Patricia trees can be used to implement sets and
maps that are not only faster than modules Set and Map, but
are also compatible with generic equality (set equality is
structural equality). See
http://www.lri.fr/~filliatr/software.en.html

Best wishes,

  • Xavier Leroy

@vicuna
Copy link
Author

vicuna commented Dec 4, 2003

Comment author: administrator

I'm sorry. I was wrong. The problem is not in Map. Its in Set.
[...]

Both maps and sets cannot reliably be compared with the generic
equality primitive, because equivalent sets or maps may have different
representations as AVL trees. Consider for instance the set {a;b}
with a < b. Two possible representations are

    b               and             a
   /                                 \
  a                                   b

and these are structurally different.

For sets, an easy solution is to use the "equal" function provided in
the Set.S signature instead of =. There is no "equal" function
provided for maps, and this is unfortunate (I ran into this problem
very recently!). A rather inefficient "equal" function can be
coded from the "iter" and "get" functions. (I can send you the code
if you're interested.)

One possible reason the problem shows up with 3.07 and not with 3.04
is that there was a performance bug in sets and maps that was fixed in
3.07. The bug in question had to do with certain operations (such as
union and intersection) producing trees that were not properly
balanced according to the AVL criterion, thus breaking the guarantee
that lookup functions run in log n time. So, 3.07 rebalances more
aggressively in some circumstances, and this could cause a structural
equality that held before "by accident" to no longer hold.

In the unlikely case that your set elements or map keys are integers,
Patricia trees can be used to implement sets and maps that are not
only faster than modules Set and Map, but are also compatible with
generic equality (set equality is structural equality). See
http://www.lri.fr/~filliatr/software.en.html

Best wishes,

  • Xavier Leroy

@vicuna
Copy link
Author

vicuna commented Dec 7, 2003

Comment author: administrator

Upgrading to 3.07 fixed this issue. Thanks.

In case you're interested in such things, the performance of SLAM
compiled w/ 3.04 seems to pretty well match that of 3.07. There was
just a fraction of a slowdown between 3.04 and 3.07 on an experiment
using the same source code on 25 model checking runs that are fairly
representative of our overall results. Not horrible though: basically
the 3.07 code was about 2 minutes slower on a 5 hour set of benchmarks.

  • Byron

-----Original Message-----
From: Damien Doligez [mailto:caml-bugs@pauillac.inria.fr]
Sent: Tuesday, December 02, 2003 10:07 AM
To: Byron Cook
Subject: RE: Possible bug in GC (#8393)

A little bit of stats for you. We run our o'caml based
model checker
on about 900 real-world benchmarks that require a couple of
days to run
on about 4 machines. On those benchmarks this occurs about 3 times
when running on the 900 benchmarks.

I'm really impressed that you can do statistics on such an
obscure bug...
You must be using O'Caml quite a lot.

I'm seeing what I think might be a bug in ocamlopt or the ocaml
runtime. I'm using "The Objective Caml native-code
compiler, version
3.04, Standard library directory:
c:\ocaml\lib" on Windows XP.

I'm pretty sure it is a known bug in 3.04. I think it was
there since 3.00.
It was fixed for 3.05.

If you can upgrade to 3.07, the bug should go away.
Otherwise, if you can completely avoid using Gc.finalise (and
Gc.create_alarm), you should be safe. If neither solution is
acceptable, I can try to cook up a patch for 3.04.

Anyway, thanks for your bug report. Let me know which
solution you choose, and more important, whether it works :-)

-- Damien

@vicuna
Copy link
Author

vicuna commented Jan 7, 2004

Comment author: administrator

see also #8386
fixed by upgrading to 3.07 2004-01-07 -DD

@vicuna vicuna closed this as completed Jan 7, 2004
@vicuna vicuna added the bug label Mar 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant