Re: curried fns
 Pierre Weis
[
Home
]
[ Index:
by date

by threads
]
[ Message by date: previous  next ] [ Message in thread: previous  next ] [ Thread: previous  next ]
[ Message by date: previous  next ] [ Message in thread: previous  next ] [ Thread: previous  next ]
Date:   (:) 
From:  Pierre Weis <Pierre.Weis@i...> 
Subject:  Re: curried fns 
> Something I sometimes wonder about is which of the following is more > efficient. Perhaps a CAML Light expert can comment. > > let map f = > let rec mapf l = > match l with > [] > [] >  (x::t) > let y = f x in y::(mapf t) in > mapf;; > > or > > let rec map f l = > match l with > [] > [] >  (x::t) > let y = f x in y::(map f t);; The answer is clearly: this is compiler dependent! The first definition is a strange view of map as a (nonrecursive) function that compute a looping function. The second definition is more natural and properly exposes that map gets two arguments. If there are no efficiency reasons to prefer the first form I definitively prefer the second ``natural'' definition. The problem is more generally ``how to define functions with 2 arguments'': there is no syntactic way to define functions with multiple arguments in Caml. Thus the user has to encode it, either via curryfication or via tuples (let f x y = ... or let f (x,y) = ...). Naive compilation of these two encodings is not efficient, let's explain why. Suppose f defined as let f x y = body or equivalently let f = function x > function y > body. As is explicit on this expanded form, f can be partially applied: if e1 evaluates to v1, then the expression f e1 evaluates to a closure with code part (function y > body) and environment part (x, v1). Thus the naive compilation of ``f e1 e2'' (or equivalently (f e1) e2) starts by calling f with argument v1, then f builds and returns the intermediate closure, and this closure is then applied to (the value of) e2. This is inefficient, since we perform 2 function calls and vaste memory to build this intermediate closure which is immediately applied and becomes garbage. Now suppose f be defined as let f (x, y) = body the naive evaluation of f (e1, e2) is now: 1) evaluate e1 and e2 leading to v1 and v2 2) build the tuple (e1, e2) 3) call f with this tuple. Here we perform only one function call, but once again we build an intermediate data structure to call the function, and this data structure becomes garbage as soon as entering the function. A clever way of compiling one of this two forms is necessary since functions with multiple arguments are mandatory and heavily used. This is a bit difficult but the idea is mainly the following:  for curried functions directly call their body when they are applied with all their arguments (complete applications).  for functions with tuples directly call their body without constructing the tuple (these functions are always completely applied). Since curried functions are a bit more powerful, since they can be partially applied, we may suppose that users will adopt this way of encoding multiple arguments. Thus, generally speaking Caml compilers optimize the curried form of encoding. These optimizations are not always as efficient as the scheme described above, but compilation of currification is definitively better than tupling for Caml Light, Caml Special Light, Bigloo and Camlot. Caml compilers generally do not optimize tupling. We may now discuss the efficiency of your example. The naive form: let rec map f l = match l with [] > []  (x::t) > let y = f x in y::(map f t);; exposes that map has two arguments. Thus a compiler that optimizes complete application of curried functions will efficiently compile the recursive call. On the other hand a naive compiler will build the partial intermediate closure for each recursive call. On the other hand, with the first encoding: let map f = let rec mapf l = match l with [] > []  (x::t) > let y = f x in y::(mapf t) in mapf;; the optimizing compiler may fail to recognize that map may be optimized as a curried functions with two arguments: it will instead compute a closure for mapf, and generally speaking a closure is less efficiently applied. An interesting case: the Bigloo compiler performs a socalled $\eta$optimization that automatically rewrite this code to (an equivalent of) the first natural one! The naive compiler will compile naively as usual, but now your code effectively require to compute the intermediate closure only once, when partially applying map to f. Hence recursive calls to mapf will save this extra closure construction, and the second form will be a bit more efficient. In conclusion: prefer the natural encoding, it is clearer and compilers are designed to optimize natural code. Pierre Weis  WWW Home Page: http://pauillac.inria.fr/~weis Projet Cristal INRIA, BP 105, F78153 Le Chesnay Cedex (France) Email: Pierre.Weis@inria.fr Telephone: +33 1 39 63 55 98 Fax: +33 1 39 63 53 30 