You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Basically, we used to have a very nice and completely polymorphic
weak_memo module (that solves the following problem - if you have an
arbitrary mutually-recursive data type that you are converting into an
"isomorphic" one, how do you set it up so that equal inputs are mapped
to the same output, and the whole process takes linear time); and in
3.07 this module needs to include various calls to Obj to be able to
figure out whether the output happens to be an unboxed value.
If I understand correctly, you are describing the hash-consing problem.
Yes, sort of, but not just for a specific data structure. We have a
number of different places in our project where we want to do
hash-consing, so we need some generic polymorphic module that would do
at least some portion of the work in every case.
I don't see why you need helper data to solve it.
Well, we can not hash the whole thing on every step (this would make the
whole algorithm quadratic instead of linear), so for every "cons" we
create a "header" where the pointers are replaced with indices into a
weak array (it's set up in a way that we know that header can only be
used while the corresponding data still has references to it, so the
indices must still point to Some's (*) ).
Basically, for every data type involved in a recursive type definition,
we keep two data structures:
A weak array that holds the "output data"
A normal hash table that maps "headers" to indices in the weak array.
Periodically we clean up the hash table - if we have a "header -> i"
mapping, but the weak array has None in slot i, we delete the old header
from the hash table and reuse the i'th slot for a completely new header
and data.
Two unboxed values
are (==) if and only if they are (=), so you don't need any sharing
information for unboxed values.
Right. But if all you know is that your value has type 'a, then what do
you do? In 3.06 we could proceed in a generic polymorphic fashion and it
would work, but in 3.07+2 we not have to invoke Obj to figure out
whether the value is boxed or not, and have some special code to
short-circuit the whole Weak array process in case of an unboxed value.
(*) Unfortunately in 3.07+2 it is no longer true - we also have to check
that data does not happen to be unboxed!
P.S. Here is the signature of the module I am talking about:
module type WeakMemoSig =
sig
(*
* Weak_descriptors are just indicies that
* permit the values associated with them to be collected
* by GC.
*)
type 'a weak_descriptor
val wd_hash : 'a weak_descriptor -> int
(*
* Descriptors are analog of weak_descriptor,
* but they do not permit their values to be collected.
*)
type 'a descriptor
(* Memo-type:
* 'param - an arbitrary argument (usually used for recursion on
memo tables).
* 'arg - the type of arguments
* 'header - transformed argument where all recursive references are
replaced with
* its target's descriptors
* 'weak_header - transformed argument where all recursive
references are replaced with
* its target's weak_descriptors
* 'image - result type
*)
type ('param, 'arg, 'header, 'weak_header, 'image) t
(*
* Release the value to GC.
*)
val weaken : 'a descriptor -> 'a weak_descriptor
(*
* Compare descriptors.
* This is useful for constructing sets of values.
*
* The use of this function is not advised for casual
* users since some implementations of WeakMemo may
* not provide this function.
*)
val compare : 'a descriptor -> 'a descriptor -> int
(*
* Creates new memo-structure.
*)
val create : int -> int -> (* These numbers are size of
header's hash table
* (and halfsize of array of
target objects)
* and critical level of
holes to restart GC. )
string -> ( For debugging,
this is the name of the table )
('param -> 'arg -> 'header) -> ( Convert the
argument to a header )
('param -> 'header -> 'weak_header) -> ( Convert 'header
to 'weak_header )
('weak_header -> 'weak_header -> bool) -> ( Headers'
comparison function )
('param -> 'header -> 'image) -> ( Converter from
header to result *)
('param, 'arg, 'header, 'weak_header, 'image) t
(*
* Creates new memo-structure.
* Apply is not allowed, and table is initialized to default size.
*)
val create_default :
string -> (* Name of the
table )
('param -> 'header -> 'weak_header) -> ( Convert 'header
to 'weak_header )
('weak_header -> 'weak_header -> bool) -> ( Headers'
comparison function )
('param -> 'header -> 'image) -> ( Converter from
header to result *)
('param, 'arg, 'header, 'weak_header, 'image) t
(*
* Looks for header and returns result's descriptor if succeed
otherwise evaluate the result
* and memorize it (if result GC-ed already then reevaluate it) and
returns
* its descriptor
*)
val lookup : ('param, 'arg, 'header, 'weak_header, 'image) t ->
'param -> 'header -> 'image descriptor
(*
* As previous but assume result has not been collected by GC
* (if not, an exception is raised).
*)
val unsafe_lookup : ('param, 'arg, 'header, 'weak_header, 'image) t
-> 'param -> 'header -> 'image descriptor
(*
* Return the value represented by the descriptor.
*)
val retrieve : ('param, 'arg, 'header, 'weak_header, 'image) t ->
'param -> 'image descriptor -> 'image
(*
* The descriptor allows dereferencing.
*)
val retrieve_hack : 'image descriptor -> 'image
(*
* Check that a descriptor is consistent.
* This is for descriptors that may have been passed across the
* network.
*)
val retrieve_check : ('param, 'arg, 'header, 'weak_header, 'image) t
-> 'image descriptor -> bool
(*
* Compose conversion, lookup, and retrieve
*)
val apply : ('param, 'arg, 'header, 'weak_header, 'image) t ->
Well, we can not hash the whole thing on every step (this would make
the
whole algorithm quadratic instead of linear), so for every "cons" we
create a "header" where the pointers are replaced with indices into a
weak array (it's set up in a way that we know that header can only be
used while the corresponding data still has references to it, so the
indices must still point to Some's (*) ).
Basically, for every data type involved in a recursive type definition,
we keep two data structures:
A weak array that holds the "output data"
A normal hash table that maps "headers" to indices in the weak array.
Periodically we clean up the hash table - if we have a "header -> i"
mapping, but the weak array has None in slot i, we delete the old
header
from the hash table and reuse the i'th slot for a completely new header
and data.
If I understand correctly, you shouldn't be creating headers for
unboxed data. Unboxed data are always shared, so hash-consing is
irrelevant for them.
I think you should try to insert the output data into the weak array,
use Weak.check to see if it "sticks", and only create the header
if it does.
Original bug ID: 1929
Reporter: administrator
Status: closed
Resolution: fixed
Priority: normal
Severity: feature
Category: ~DO NOT USE (was: OCaml general)
Bug description
On 14.11.2003 05:03, Damien Doligez wrote:
Yes, sort of, but not just for a specific data structure. We have a
number of different places in our project where we want to do
hash-consing, so we need some generic polymorphic module that would do
at least some portion of the work in every case.
Well, we can not hash the whole thing on every step (this would make the
whole algorithm quadratic instead of linear), so for every "cons" we
create a "header" where the pointers are replaced with indices into a
weak array (it's set up in a way that we know that header can only be
used while the corresponding data still has references to it, so the
indices must still point to Some's (*) ).
Basically, for every data type involved in a recursive type definition,
we keep two data structures:
Periodically we clean up the hash table - if we have a "header -> i"
mapping, but the weak array has None in slot i, we delete the old header
from the hash table and reuse the i'th slot for a completely new header
and data.
Right. But if all you know is that your value has type 'a, then what do
you do? In 3.06 we could proceed in a generic polymorphic fashion and it
would work, but in 3.07+2 we not have to invoke Obj to figure out
whether the value is boxed or not, and have some special code to
short-circuit the whole Weak array process in case of an unboxed value.
(*) Unfortunately in 3.07+2 it is no longer true - we also have to check
that data does not happen to be unboxed!
P.S. Here is the signature of the module I am talking about:
module type WeakMemoSig =
sig
(*
* Weak_descriptors are just indicies that
* permit the values associated with them to be collected
* by GC.
*)
type 'a weak_descriptor
val wd_hash : 'a weak_descriptor -> int
memo tables).
* 'arg - the type of arguments
* 'header - transformed argument where all recursive references are
replaced with
* its target's descriptors
* 'weak_header - transformed argument where all recursive
references are replaced with
* its target's weak_descriptors
* 'image - result type
*)
type ('param, 'arg, 'header, 'weak_header, 'image) t
header's hash table
* (and halfsize of array of
target objects)
* and critical level of
holes to restart GC.
)
string -> ( For debugging,
this is the name of the table )
('param -> 'arg -> 'header) -> ( Convert the
argument to a header )
('param -> 'header -> 'weak_header) -> ( Convert 'header
to 'weak_header )
('weak_header -> 'weak_header -> bool) -> ( Headers'
comparison function )
('param -> 'header -> 'image) -> ( Converter from
header to result *)
('param, 'arg, 'header, 'weak_header, 'image) t
table )
('param -> 'header -> 'weak_header) -> ( Convert 'header
to 'weak_header )
('weak_header -> 'weak_header -> bool) -> ( Headers'
comparison function )
('param -> 'header -> 'image) -> ( Converter from
header to result *)
('param, 'arg, 'header, 'weak_header, 'image) t
otherwise evaluate the result
* and memorize it (if result GC-ed already then reevaluate it) and
returns
* its descriptor
*)
val lookup : ('param, 'arg, 'header, 'weak_header, 'image) t ->
'param -> 'header -> 'image descriptor
-> 'param -> 'header -> 'image descriptor
'param -> 'image descriptor -> 'image
-> 'image descriptor -> bool
'param -> 'arg -> 'image
end
--
Aleksey Nogin
Home Page: http://nogin.org/
E-Mail: nogin@cs.caltech.edu (office), aleksey@nogin.org (personal)
Office: Jorgensen 70, tel: (626) 395-2907
The text was updated successfully, but these errors were encountered: