Re: Unboxed options

From: Francois Pottier (Francois.Pottier@inria.fr)
Date: Tue Oct 12 1999 - 17:06:15 MET DST


Date: Tue, 12 Oct 1999 17:06:15 +0200
From: Francois Pottier <Francois.Pottier@inria.fr>
To: caml-list@inria.fr, chet@watson.ibm.com
Subject: Re: Unboxed options
In-Reply-To: <199910121420.QAA02045@pauillac.inria.fr>; from Pierre Weis on Tue, Oct 12, 1999 at 04:20:08PM +0200

Hi Chet,

> After some thought, it occurred to me that, given ZINC's
> representation of objects, it should be possible to add a new
> type-constructor,
>
> type 'a unboxed_option =
> ubSome of 'a
> | ubNone
>
> wherein ubSome is represented by the identitty function in the code,
> and ubNone by NULL -- zero.

Definitely -- in fact, this topic has already come up on the mailing
list in the past. Adding a NULL value to every type is relatively easy,
and does not even require any modifications to the compiler -- it
suffices to add one new library module. Of course, this module has
to use some operations deemed ``unsafe'' by the compiler, but it can
be given an interface which makes it type-safe.

> Going further, things like ('a unboxed_option) unboxed_option are
> somewhat flawed, but they do work, (as they must, if this is going to
> work) -- of three possibilities,

Yes indeed. This slightly inelegant point was possibly the reason why
these ``unboxed options'' were never added to the language or to its
standard library. It was believed that the user, if unaware of this
issue, may be led to writing well-typed programs which make no sense
at all. The programmer has to avoid composing ``unboxed_option'' with
itself -- but it is easy to inadvertently break this law.

To conclude, such a feature does not seem to have its place within
the language or its standard library, but is easy to implement. I
include a possible implementation, which is very simplistic, but gives
the idea. (Text appears after my signature. Best viewed with 120 cols.)

I will conclude by saying that after trying out these ``unboxed
options'', I decided not to use them -- O'Caml's generational GC was
just as efficient working with normal, boxed options. My benchmark was
a highly optimized implementation of Paige & Tarjan's partition
refinement algorithm (which may be used to minimize non-deterministic
finite-state machines). A pretty impressive fact, I believe.

-- 
François Pottier
Francois.Pottier@inria.fr
http://pauillac.inria.fr/~fpottier/

(* pointer.mli *)

(*********************************************************************************************************************) (* *) (* Wallace *) (* *) (* François Pottier, projet Cristal, INRIA Rocquencourt *) (* *) (* Copyright 1998 Institut National de Recherche en Informatique et Automatique. Distributed only by permission. *) (* *) (*********************************************************************************************************************) (* $Header: /home/pauillac/formel1/fpottier/cvs/wallace-out-of-date/Attic/pointer.mli,v 1.1.2.1 1998/10/19 17:16:58 fpottier Exp $ *) (*

A space- and time- efficient option type. Essentially, this module implements C-style pointers in a type-safe fashion.

*) (* ----------------------------------------------------------------------------------------------------------------- *) (*

An abstract type for pointers.

*)

type 'a t

(* ----------------------------------------------------------------------------------------------------------------- *) (*

A pointer can be nil, or it can point to some piece of data.

*)

val nil: 'a t external make: 'a -> 'a t = "%identity"

(* ----------------------------------------------------------------------------------------------------------------- *) (*

If a pointer is known not to be nil, it can be turned back into a regular Caml value at no cost. This operation is *not* type-safe, but it is provided for convenience.

*)

external cast: 'a t -> 'a = "%identity"

(* ----------------------------------------------------------------------------------------------------------------- *) (*

Pattern matching is lost, so we provide various primitives to check for nil. Note that in some cases, better efficiency can be achieved by explicitly checking for nil (using == equality) and casting. This avoids the overhead of a higher-order function call, which can be especially bad when it prevents tail recursion.

*)

val use: ('a -> unit) -> 'a t -> unit val fold: ('a -> 'b -> 'b) -> 'a t -> 'b -> 'b

(* pointer.ml *)

(*********************************************************************************************************************) (* *) (* Wallace *) (* *) (* François Pottier, projet Cristal, INRIA Rocquencourt *) (* *) (* Copyright 1998 Institut National de Recherche en Informatique et Automatique. Distributed only by permission. *) (* *) (*********************************************************************************************************************) (* $Header: /home/pauillac/formel1/fpottier/cvs/wallace-out-of-date/Attic/pointer.ml,v 1.1.2.1 1998/10/19 17:16:58 fpottier Exp $ *) (*

A space- and time- efficient option type. Essentially, this module implements C-style pointers in a type-safe fashion.

*)

open Errors

(* ----------------------------------------------------------------------------------------------------------------- *) (*

An abstract type for pointers.

This type declaration actually only reflects the definition of nil, which is a pointer to a dummy record structure. Its unique field is -- as its name implies -- unused, but O'Caml does not accept empty record types.

*)

type 'a t = { unused : int }

(* ----------------------------------------------------------------------------------------------------------------- *) (*

The nil pointer must be a unique value, so as to guarantee that we will not confuse it with a pointer to a valid user object. We create a dummy object for this purpose.

It is important that this object *not* be mutable, so that it receives a polymorphic type. Also, to guarantee that this object is indeed unique, we must make sure that the compiler is not smart enough to share instances across module boundaries. (This ensures that a value provided by the user cannot happen to have the same address as our nil value, which is hidden inside this module.) Currently, the bytecode compiler does no sharing, while the native code compiler appears to be able to share immutable values only within a single module.

*)

let nil = { unused = 0 }

(* ----------------------------------------------------------------------------------------------------------------- *) (*

Converting a value to a pointer is, of course, a no-op. *)

external make: 'a -> 'a t = "%identity"

(* ----------------------------------------------------------------------------------------------------------------- *) (*

Converting a pointer back to a value is also a no-op, provided the pointer is known not to be nil.

*)

external cast: 'a t -> 'a = "%identity"

(* ----------------------------------------------------------------------------------------------------------------- *) (*

Pattern matching is lost, so we have to provide various primitives to check for nil.

*)

let use action pointer = if pointer != nil then action (cast pointer)

let fold action pointer accu = if pointer == nil then accu else action (cast pointer) accu



This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:27 MET