Compiler feature  useful or not?
[
Home
]
[ Index:
by date

by threads
]
[ Message by date: previous  next ] [ Message in thread: previous  next ] [ Thread: previous  next ]
[ Message by date: previous  next ] [ Message in thread: previous  next ] [ Thread: previous  next ]
Date:  20071115 (13:26) 
From:  weis@y... 
Subject:  Re: [Camllist] Compiler feature  useful or not? 
> > a value of type row is in fact a concrete integer (it is not hidden in > >any way), > > But you cannot apply integer operations to it, so it is not a normal > concrete integer, right? Right: a value of type row has type row ... not type int! > Can you show an example where replacing all "type t = private ..." > definitions by "type t" changes a welltyped program into an illtyped > one? I understand that if private types create real subtypes (w.r.t. > :>) then they are different than abstract types, but otherwise, I don't > see the difference for the typechecker. May be, I must recall what are private types in the first place: private types has been designed to implement quotient data structure. (*** Warning. Wao: after rereading this message I realize that it is really long. You can skip it, if you already know something about mathematical quotient structures, private types for record and variant, relational types and the mocac compiler! ***) What is a quotient ?  Here quotient has to be understood in the mathematical sense of the term: given a set S and an equivalence relation R, you consider the set S/R of the equivalence classes modulo the relation R. S/R is named the quotient structure of S by R. Quotient structures are fundamental in mathematics and there properties have been largely studied, in particular to find the relationship between operations defined on S and operations on S/R: which operations on S can be extended to operations on S/R ? Which properties of operations on S are still valid for there extension on S/R ? and so on. Simple examples of quotient structures can be found everywhere in maths, for instance consider the equivalence relation R on pairs of integer values defined as z_1 equiv z_2 if and only if z_1 and z_2 are both odd or are both even (in Caml terms z_1 equiv z_2 if and only if (z_1 mod 2) = (z_2 mod 2)) The set Z/R is named Z/2Z and it inherits properties of operations on Z: it is an abelian group (and more). A wonderful example of such inheritance of interesting properties by inheritance of operations thanks to a definition by a quotient structure is the hierarchy of sets of numbers: starting with N (the set of natural numbers) we define Z (the set of relative integer numbers) as a quotient of NxN), then Q (the rational numbers) as a quotient of ZxZ*, R (the set of real numbers) as a quotient of Q^N (the sets of sequences of rational numbers), C (the set of complex numbers) as a quotient of R[X] (the set of polynomials with one unknown and real coefficients). Note here that at each step of the hierarchy the quotient structure is richer (has more properties and/or more elements) than the original structure: first we have a monoide, then a group and a ring, then a field, then a complete field and so on. Why quotient structures ?  So quotient structures are a fundamental tool to express mathematical structures and properties. Exactly as mathematical functions, relations and sets. As you may have noticed, programming languages are extremely related to maths (due to their purely logical bases) even if, in some cases, the zealots of the language at hand try to hide the mathematical fundation into a strange anthropomorphic discurse to describe their favorite language features. In the end, the computer programing languages try hard to incorporate powerful features from mathematics, because these features have been polished by mathematicians for centuries. As a consequence of this work, we know facts, properties and theorems about the mathematical features and this is an extremely valuable benefit. Now, what is the next frontier ? What can we steal more to mathematics for the benefit of our favorite programming language ? If we review the most powerful tools of mathematicians, we found that functions have been borrowed (hello functional programming, lambdacalculus and friends :)), relations have been borrowed (data bases, hash tables, association lists), sets have been more or less borrowed (hello setle, pascal, and set definition facilities from various libraries of various languages...). More or less, we have all those basic features. >From the mathematical set construction tools, we have got in Caml:  the cartesian product of sets (the * binary type constructor, the record type definitions),  the disjunct union of sets (the  of polymorphic variants, the sum (or variant) type definitions). Unfortunately, we have no powerful way to define a quotient data structure. That what private type definitions have been designed to do. What do we need for a quotient data structure ?  In the first place, we need the ``true'' thing, the real feature that is indeed used in maths. Roughly speaking this means to assimilate the quotient set S/R to a subset of S. In the previous definition of quotient structures, there is a careful distinction between the base set S and the quotient set S/R. In fact, there always exists a canonical injection from S to S/R, and if we choose a canonical representant in each equivalence class of S/R, we get another canonical injection from S/R to S, so that S/R can be considered as a subset of S (the story is a bit more complex but that's the idea). This injection/projection trickery is intensively used in maths; for instance, in the hierarchy of number sets, we say and consider that N is a subset of Z that itself is a subset of Q, a subset of R, a subset of C. Rigourously, we should say for instance, there is a subset of Z that is canonically isomorphic to N; in fact, we abusively assimilate N to this subset of Z; hence, we say that N is ``included'' in Z, when we should say ``the image of N by the canonical isomorphism from N to Z'' is included in Z; then, we go one step further in the assimilation of N to a subset of Z, by denoting the same, the elements of N and there image in Z; for instance, ``the neutral element of the multiplication in Z'' and the successor of 0 in N is denoted ``1''; and we now can say that 1 belongs to Z. Note here that, in the first place, ``the neutral element of the multiplication in Z'' is an equivalence class (as all elements in Z are). So it is a set. More exactly, the ``neutral element of the multiplication in Z'' is the infinite set of pairs of natural numbers (n, m) such that n  m = 1 (here ``'' is an operation in N and ``1'' is the successor of the natural number ``0'', so that n  m = 1 is a shorthand for n = succ m). The assimilation between N and its isomorphic image allows to use 1 as the denotation of this infinite set of pairs of natural numbers. We understand why the mathematicians always write after having designed a quotient structure: ``thanks to this isomorphism, and if no confusion may arise, we always assimilate S to its canonical injection in S/R''. This assimilation is what private type definitions allow. How do we define a quotient data structure ?  The mathematical problem:  we have a set S and an equivalence relation R on SxS,  we construct the quotient S/R.  we state afterwards: ``if no confusion may arise, we always assimilate S to its canonical injection in S/R''. The programming problem:  we have a data structure S (defined by a type s) and a relation R on SxS (defined by a function r from s * s to bool).  we construct a data structure that represents S/R.  we have afterwards: ``No confusion may arise, we always assimilate S to its canonical injection in S/R''. The private data type solution:  the data structure S is defined as any Caml data structure and the relation R is implemented by the canonical injection(s) from S to S/R.  the canonical projection from S/R to S is automatic.  S (defined by s) is assimilated to S/R (which is then also s!). We defined S/R as a subset of S, the set of canonical representants of equivalence classes of S/R. More exactly, the canonical injection from S to S/R maps each element of S to its equivalent class in S/R; if we assimilate each equivalence class to its canonical representant (an element of S), the canonical injection maps each element of S to the canonical representant of its equivalence class. Hence the canonical injection has type S > S. An example treated without private types:  Let's take a simple example: S is the following data structure that implements a mathematical (free) structure generated by the constant 0, the unary symbol succ, and the binary symbol +. type peano =  Zero  Succ of peano  Plus of peano * peano R is the (equivalence) relation that expresses that  0 is the neutral element for + (for all x in S, 0 + x = x and x + 0 = x),  + is associative (for all x, y, z in S³, (x + (y + z)) = ((x + y) + z)). The canonical injection is a function from peano > peano that maps each value in S (the type peano) to the canonical representant of its class in S/R (an element of S (the type peano)): let rec make = function  Zero > Zero  Plus (Succ n, p) > Succ (make (Plus (n, p)))  Plus (Zero, p) > p  Plus (p, Zero) > p  Plus (Plus (x, y), z) > make (Plus (x, make (Plus (y, z))))  Succ p > Succ p ;; (This function may be wrong but you see the idea :)) So, if you want to represent S/R for peano in Caml you must:  (1) define the type peano  (2) always use the make function to create a value in S/R  (3) not to confuse S and S/R in your head (I mean in your programs) Private data types permits (1), ensures (2), and helps for (3) concerning the head part and ensures (3) for programs by means of compiletime type errors. The same example with private types:  To define a private data type you must define a module.  in the implementation, you define the carrier S and its canonical injection.  in the interface, you export the carrier and the injection. The type checker will ensure transparent projection from the quotient to the carrier and mandatory use of the canonical projection to build a value in S/R. interface peano.mli  type peano = private  Zero  Succ of peano  Plus of peano * peano ;; val zero : peano;; val succ : peano > peano;; val plus : peano * peano > peano;; implementation peano.ml  type peano =  Zero  Succ of peano  Plus of peano * peano ;; let rec make = function ... ;; let zero = make Zero;; let succ p = make (Succ p);; let plus (n, m) = make (Plus (n, m));; (See note (1) for a discussion on the design of this example.) What is given by private types:   You cannot create a value of S/R if you do not use the canonical injection # Zero;; Cannot create values of the private type peano  As a consequence, values in S (i.e. S/R) are always canonical: # let one = succ zero;; val one : M.peano = Succ Zero # let three = plus (one, succ (plus (one, zero)));; val three : M.peano = Succ (Succ (Succ Zero))  Projection is automatic # let rec int_of_peano = function  Zero > 0  Succ n > 1 + int_of_peano n  Plus (n, p) > int_of_peano n + int_of_peano p ;; val int_of_peano : M.peano > int = <fun> # int_of_peano three;;  : int = 3 What about private abbreviations ?  Private abbreviation definitions are a natural extension of private data type definitions to abbreviation type definitions. The same notions are carried out mutatismutandis:  we have a data structure S (defined by a type s) and a relation R on SxS (defined by a function r from s * s to bool).  we construct a data structure that represents S/R.  we have afterwards: ``No confusion may arise, we always assimilate S to its canonical injection in S/R''. In the case of abbreviations:  the data structure S/R is defined by a type s (which is an abbreviation) and a relation defined by a function,  the canonical injection should be defined in the implementation file of the private data type module,  we always assimilate S to its canonical injection in S/R. In pratice, as for usual private types (for which the constructive part of the data type is not available outside the implementation), the type abbreviation is not known outside the implementation (it is really private to its implementation). This prevents the construction of values of S/R without using the canonical injection. Th noticeable difference is the projection function: it cannot be fully implicit (otherwise, as you said Alain, the assimilation will turn to complete confusion: we would have S identical to S/R, hence row=int and no difference between a regular abbreviation definition and a private abbreviation definition). If not implicit, the injection function should granted to be the identity function (something that we would get for free, if we allow projection via subtyping coercion). In short: abstract data types provide values that cannot be inspected nor safely manipulated without using the functions provided by the module that defines the abstract data type. In contrast, private data types are explicitely concrete and you can freely write any algorithm you need. A good test is printing: you can always write a function to print values of a private type, it is up to the implementor of an abstract type to give you the necessary primitives to access the components of the abstracted values. Automatic generation of the canonical injection:  You may have realized that writing the canonical injection can be a real challenge. The moca compiler (see http://moca.inria.fr/) helps you to write the canonical injection by generating one for you, provided you can express the injection at hand via a set of predefined algebraic relations (and/or rewrite rules you specify) attached to the constructors of the private type. Private types with constructors having algebraic relations are named relational types. Moca generated canonical injections for relation types. For instance, you would write the peano example as the following peano.mlm file: type peano = private  Zero  Succ of peano  Plus of peano * peano begin associative neutral (Zero) rule Plus (Succ n, p) > Succ (Plus (n, p)) end;; The mocac compiler will generate the interface and implementation of the peano module for you, including the necessary canonical injection function. Best regards,  Pierre Weis INRIA Rocquencourt, http://bat8.inria.fr/~weis/ Note (1):  we can't just export the canonical injection since you could not build any value of the type without previously defined values!  we provide specialized versions of the canonical injection function introducing the convention that the lowercase name of a value constructor is the name of its associated canonical injection.  we do not export the plasin true canonical injection since it would be more confusing than useful.