Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
[Caml-list] memoization and CPS
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2003-04-30 (12:00)
From: Diego Olivier Fernandez Pons <Diego-Olivier.FERNANDEZ-PONS@c...>
Subject: Re: [Caml-list] memoization and CPS

Pietro Abate wrote :

> I'm trying to understand and merge these two programming techniques
> : memoization and CPS [...] are these two styles compatible ? Or the
> very nature of the CPS make impossible to exploit memoization ?

Memoization and continuation passing style (CPS) are orthogonal but
the combination of both techniques is not very usual. I only know a
few examples, I will try to explain them. Some references are given.

1. Memoization

First of all, memoization is not restricted to a linear support
(array, hashtable, list, ...) or to equality (when you look in your
table you use an equality test on x to know if (f x)  has already been
computed or not)

There are roughly 3 parts in an search algorithm :
- prevision (where not to go because it is useless)
- heuristic (where to go when you have several choices)
- propagation (where not to go because you have already been there)

Algorithms can be classified by the amount of prevision/tabulation
they do :

Pereira and Warren (1983) showed that :
- topdown parsing (LL (k)) was purely predictive
- bottom-up parsing (CYK, LR (0)) was purely propagative
- Early and LR (k) parsers used both techniques

That is why the latter techniques are more powerful

Lang and Villemonte de la Clergerie (1988-1993) showed that :
- Prolog is purely predictive
- Datalog is purely propagative

That is why Prolog has looping problems and Datalog does a lot of
useless work (it takes all the known theorems, tries to prove all it
can, and then sees if the query belongs to the new set of theorems -
and this needs a restriction on the clauses to be correctly layered).
DyALog (and later some other tabulating logical systems like XSB or
B-prolog) solves (most of) this problems.

This is also true for 'knapsack' algorithms (Martello and Toth)
- branch and bound algorithms are predictive
- dynamic programming is propagative
- mixed algorithms use both techniques

Once more, mixed algorithms are better.

All the tabulation techniques presented use a partial order on
computations to be done. And since they have at least one of the
following properties :

 (a) Only undominated computations will be asked for
 (b) Computing a dominated element from an undominated is easy
 (c) If an dominated element is asked, a dominated fits better

only the undominated elements in the set of already computed elements
has to be recorded.

I suppose some examples will be easier to understand

(a) the fibonnaci function

let rec fib = function
  | 0 -> 0
  | 1 -> 1
  | n -> fib (n - 1) + fib (n - 2)

if you consider couples (x, y) with the domination relation -> 'needs
... to be computed' then you obtain a chain which trivially means that
you need only to compute the last state (and not the whole array of
values). This leads directly to

let rec fib = function
  | 0 -> (0, 1)
  | k -> let (n, m) = fib (k - 1) in (m, m + n)

0ther example : single source/single target shortest path in a graph
with only positive costs

You have a result by Dijkstra that says : 'if S is the set of all
nodes for which the shortest path is already known, there is a node in
the fronteer of S for which the shortest path can be directly
computed'. This means that you can free all the nodes in S that do not
belong to the fronteer of S. It also induces a partial order on the
boolean algebra of subsets of nodes in the graph.

More over, euclidian cuts (triangular inequality on metric problems)
and A* algorithm are examples of predition and heuristics for the same

(b) When you have a theorem that says 'the sum of the angles of any
triangle is 180 degrees' you can always use it for a particular

The same is true in Prolog with unifiers and substitutions.

The point is that in the latter case, the number of particular
theorems is infinite, therefor you cannot record them all (only the
most general known at the time of the computation, eventually a finite
number of less general but useful theorems). Partial orders are
unavoidable when dealing with infinite objects.

(c) Bounds for NP-hard approximation problems like the knapsack
problem. (section 2.6.1 on dynamic programming algorithms of the
Martello and Toth book) "The number of states considerated at each
stage can be considerably reduced by eliminating dominated states,
that is those states for which there exists a state with ... "

This induces a partial order different from the simple equality of
usual dynamic programming procedures.

Some times having a partial order is not enough, it is the case with
unification grammar/ attribute grammars.

A unification grammar is a grammar in which terminals and non
terminals are augmented with some data that has to unify to validate
the transformation

 P -> N (person) V (person)

which means a proposition (P) can derive a noun (N) and a verb (V) of
same number (must unify in CLP(X) where X is a classical domain for
constraing programming e.g. FD, Herbrand, ...)

ex P -> He walks (* correct *)
   P -> He walk (* incorrect *)

An attribute grammar is roughly the same thing

 E -> E + E  (value = value1 + value2)

The information flow in unification can go both ways in the parse tree
and can have arbitrary long range dependencies. The work of attribute
grammar research is to statically analyse this flow and provide
deterministic procedures with adequate data structures to handle it.

The main problem is that if there is of course a partial order on
computations to be done, this order depends on the parse tree. The
usual solution is to restrict the power of the grammar until you get
a property invariant on parse trees (ex. Strongly non circular
attribute grammars, la attribute grammars, ...)

In this case, the memoization support may be a tree, a stack, a global
variable ...

2. Examples of memoization and CPS

In 'Memoization of top-down parsing' Mark Johnson, Computational
Linguistics 21(3): 405-417 (1995)

the author notices that top-down backtracking functional parsers
(terminals and non terminals are coded by functions) have to major
problems :
 - a significant amount of redundant computation is done when
 - they fail with left-recursive grammars

The first point may be solved by classical memoization but the second
does not. This is because when you use a rule of the type

  A -> AB | a
  B -> b

To memoize the f_A function, it must end at least once.
The idea is to make f_A to work incrementially using CPS.

Mark Johnson's paper is very readable. Examples are given in Scheme
(with true CPS, not with call-cc)

The second example I know is DyALog

The first idea is to extend push-down automata (pda) with unification
: in an automata you are allowed to take a transition when the
equallity test succeeds (the label matchs with the current token). In
lpda you use unification (the label unifies with the current token)

The second idea is that automata can be interpreted in a dynamic
programming style.

When parsing context-free grammars you often do a 'normal
transformation' either explicitely putting the grammar in a Greibach
like normal form (two symbols by rule, with some constraints)

  A -> a
  B -> AB | b

either implicitely with LR items

  { A -> a . Bb }

That is why all this algorithms achieve at least cubic worst-time

You can also do it on the stack of the pda and use memoization
(ordered with unification), which DyALog does.

Now comes the problem : after having computed a substitution, you have
to apply it to the whole stack, which violates the principle of an
algorithm working only on the top of the stack.

The solution is to accumulate the substitutions to be done in a
function which is chained (composed with others) or applied when
either a symbol is pushed or popped. This function represents the
continuation of the computation to be done.

3. References

- Parsing

Pereira, F. and Warren D. [1983]: Parsing as Deduction; Proceedings of
the 21st Annual meeting of the Association for Computational
Linguistics; Cambridge MA; pp137-144

more up to date papers (available on the web)

Principles and Implementation of Deductive Parsing (1994)  Stuart M.
Shieber, Yves Schabes, Fernando C.N. Pereira Journal of Logic

Also the work of Klaas Sikkel (a book at Springer and a lot of papers)

- Tabulation (memoization) in logical programming

See the pages of the Atoll project at inria

The really best one is the tutorial (in french)

Tabulation et traitement de la langue. ATALA, Cargèse, Corse, France,
July 1999. Tutoriel présenté à la 6ème conférence annuelle sur le
Traitement Automatique des Langues Naturelles (TALN'99),
Éric Villemonte de la Clergerie.

See also XSB

Some papers by Mark Johnson (available via CiteSeer or Google)

Memoization of Coroutined Constraints (1995)

Memoization of top-down parsing (1995)

Memoization in Constraint Logic Programming (1993)

And a simple but good enough article by a Ms student

Matthew Frank, Efficient Execution of Declarative Programs (Area Exam
Report), February 9, 2001. (

- Attribute grammars and links with logical and functional programming

the classical reference

A grammatical view of logic programming (MIT press 1993)
Pierre Deransart, Jan Maluszynski

See also the work of the Oscar project with FNC-2

and the Utrecht software technology group

- Knapsack problem

S. Martello and P. Toth, Knapsack problems, Wiley, 1990.

        Diego Olivier

To unsubscribe, mail Archives:
Bug reports: FAQ:
Beginner's list: