Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re: [Caml-list] Re: Constants immediatelly disappear from the Weak (PR#1927) #8370

Closed
vicuna opened this issue Nov 14, 2003 · 2 comments
Closed

Comments

@vicuna
Copy link

vicuna commented Nov 14, 2003

Original bug ID: 1929
Reporter: administrator
Status: closed
Resolution: fixed
Priority: normal
Severity: feature
Category: ~DO NOT USE (was: OCaml general)

Bug description

On 14.11.2003 05:03, Damien Doligez wrote:

On Thursday, November 13, 2003, at 12:22 AM, nogin@cs.caltech.edu wrote:

Basically, we used to have a very nice and completely polymorphic
weak_memo module (that solves the following problem - if you have an
arbitrary mutually-recursive data type that you are converting into an
"isomorphic" one, how do you set it up so that equal inputs are mapped
to the same output, and the whole process takes linear time); and in
3.07 this module needs to include various calls to Obj to be able to
figure out whether the output happens to be an unboxed value.

If I understand correctly, you are describing the hash-consing problem.

Yes, sort of, but not just for a specific data structure. We have a
number of different places in our project where we want to do
hash-consing, so we need some generic polymorphic module that would do
at least some portion of the work in every case.

I don't see why you need helper data to solve it.

Well, we can not hash the whole thing on every step (this would make the
whole algorithm quadratic instead of linear), so for every "cons" we
create a "header" where the pointers are replaced with indices into a
weak array (it's set up in a way that we know that header can only be
used while the corresponding data still has references to it, so the
indices must still point to Some's (*) ).

Basically, for every data type involved in a recursive type definition,
we keep two data structures:

  • A weak array that holds the "output data"
  • A normal hash table that maps "headers" to indices in the weak array.

Periodically we clean up the hash table - if we have a "header -> i"
mapping, but the weak array has None in slot i, we delete the old header
from the hash table and reuse the i'th slot for a completely new header
and data.

Two unboxed values
are (==) if and only if they are (=), so you don't need any sharing
information for unboxed values.

Right. But if all you know is that your value has type 'a, then what do
you do? In 3.06 we could proceed in a generic polymorphic fashion and it
would work, but in 3.07+2 we not have to invoke Obj to figure out
whether the value is boxed or not, and have some special code to
short-circuit the whole Weak array process in case of an unboxed value.


(*) Unfortunately in 3.07+2 it is no longer true - we also have to check
that data does not happen to be unboxed!

P.S. Here is the signature of the module I am talking about:

module type WeakMemoSig =
sig
(*
* Weak_descriptors are just indicies that
* permit the values associated with them to be collected
* by GC.
*)
type 'a weak_descriptor
val wd_hash : 'a weak_descriptor -> int

(*
 * Descriptors are analog of weak_descriptor,
 * but they do not permit their values to be collected.
 *)
type 'a descriptor

(* Memo-type:
 * 'param - an arbitrary argument (usually used for recursion on 

memo tables).
* 'arg - the type of arguments
* 'header - transformed argument where all recursive references are
replaced with
* its target's descriptors
* 'weak_header - transformed argument where all recursive
references are replaced with
* its target's weak_descriptors
* 'image - result type
*)
type ('param, 'arg, 'header, 'weak_header, 'image) t

(*
 * Release the value to GC.
 *)
val weaken : 'a descriptor -> 'a weak_descriptor

(*
 * Compare descriptors.
 * This is useful for constructing sets of values.
 *
 * The use of this function is not advised for casual
 * users since some implementations of WeakMemo may
 * not provide this function.
 *)
val compare : 'a descriptor -> 'a descriptor -> int

(*
 * Creates new memo-structure.
 *)
val create : int -> int ->              (* These numbers are size of 

header's hash table
* (and halfsize of array of
target objects)
* and critical level of
holes to restart GC.
)
string -> (
For debugging,
this is the name of the table )
('param -> 'arg -> 'header) -> (
Convert the
argument to a header )
('param -> 'header -> 'weak_header) -> (
Convert 'header
to 'weak_header )
('weak_header -> 'weak_header -> bool) -> (
Headers'
comparison function )
('param -> 'header -> 'image) -> (
Converter from
header to result *)
('param, 'arg, 'header, 'weak_header, 'image) t

(*
 * Creates new memo-structure.
 * Apply is not allowed, and table is initialized to default size.
 *)
val create_default :
   string ->                                      (* Name of the 

table )
('param -> 'header -> 'weak_header) -> (
Convert 'header
to 'weak_header )
('weak_header -> 'weak_header -> bool) -> (
Headers'
comparison function )
('param -> 'header -> 'image) -> (
Converter from
header to result *)
('param, 'arg, 'header, 'weak_header, 'image) t

(*
 * Looks for header and returns result's descriptor if succeed 

otherwise evaluate the result
* and memorize it (if result GC-ed already then reevaluate it) and
returns
* its descriptor
*)
val lookup : ('param, 'arg, 'header, 'weak_header, 'image) t ->
'param -> 'header -> 'image descriptor

(*
 * As previous but assume result has not been collected by GC
 * (if not, an exception is raised).
 *)
val unsafe_lookup : ('param, 'arg, 'header, 'weak_header, 'image) t 

-> 'param -> 'header -> 'image descriptor

(*
 * Return the value represented by the descriptor.
 *)
val retrieve : ('param, 'arg, 'header, 'weak_header, 'image) t -> 

'param -> 'image descriptor -> 'image

(*
 * The descriptor allows dereferencing.
 *)
val retrieve_hack : 'image descriptor -> 'image

(*
 * Check that a descriptor is consistent.
 * This is for descriptors that may have been passed across the
 * network.
 *)
val retrieve_check : ('param, 'arg, 'header, 'weak_header, 'image) t 

-> 'image descriptor -> bool

(*
 * Compose conversion, lookup, and retrieve
 *)
val apply : ('param, 'arg, 'header, 'weak_header, 'image) t -> 

'param -> 'arg -> 'image
end

--
Aleksey Nogin

Home Page: http://nogin.org/
E-Mail: nogin@cs.caltech.edu (office), aleksey@nogin.org (personal)
Office: Jorgensen 70, tel: (626) 395-2907

@vicuna
Copy link
Author

vicuna commented Nov 19, 2003

Comment author: administrator

On Friday, November 14, 2003, at 10:46 PM, nogin@cs.caltech.edu wrote:

Well, we can not hash the whole thing on every step (this would make
the
whole algorithm quadratic instead of linear), so for every "cons" we
create a "header" where the pointers are replaced with indices into a
weak array (it's set up in a way that we know that header can only be
used while the corresponding data still has references to it, so the
indices must still point to Some's (*) ).

Basically, for every data type involved in a recursive type definition,
we keep two data structures:

  • A weak array that holds the "output data"
  • A normal hash table that maps "headers" to indices in the weak array.

Periodically we clean up the hash table - if we have a "header -> i"
mapping, but the weak array has None in slot i, we delete the old
header
from the hash table and reuse the i'th slot for a completely new header
and data.

If I understand correctly, you shouldn't be creating headers for
unboxed data. Unboxed data are always shared, so hash-consing is
irrelevant for them.

I think you should try to insert the output data into the weak array,
use Weak.check to see if it "sticks", and only create the header
if it does.

-- Damien

@vicuna
Copy link
Author

vicuna commented Dec 29, 2003

Comment author: administrator

see also #8366 and #8368

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant