Version française
Home     About     Download     Resources     Contact us    
Browse thread
RE: Re: [Caml-list] Persistence
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Jon Harrop <jon@j...>
Subject: Re: [Caml-list] Persistence
On Monday 21 February 2005 20:17, Don Syme wrote:
> Could you send around the source code to this?  I intuitively knew that
> such performance results were likely, but had never had such a
> convincing example to wave at people.

Yes. The implementation of traveling salesman given in my book is another 
example which is several orders of magnitude faster than the C/C++/Fortran 
implementations given in all of the other text books that I have read.

This problem also happens to be a defacto standard test given to most natural 
science undergrads on their computing courses, which should be good for book 
sales. Tell everyone you know. :-)

> It would also be interesting to ask some C++ programmers how they would
> go about optimizing this code, for a couple of reasons
>
> (a) to see if they come up with anything faster; and

I managed to, after a while. However, doing this optimisation required me to 
have several things:

1. Estimates of run-time sizes of data structures.
2. Knowledge of the internals of the STL set implementation.
3. Knowledge of the complexities of various operations over STL sets.
4. Huge kahunas, as the resulting code is much more likely to be wrong.

The optimised C++ is also specialised by (1), whereas the OCaml code will 
always be efficient (because Xavier wrote the hard part ;-). Also, the C++ 
implementation consumes twice as much memory (which could be a real killer as 
many scientific computations use resources to the limit of the available 
computer).

Both the OCaml and the C++ can probably be optimised more, but I'm interested 
in problems where author-time is comparable to execution-time. Were I to 
continue optimising, I would move to hash sets, but these require an 
imperative approach which negates the principal advantage of the current 
OCaml implementation: persistence.

> (b) to see if they would ever even think of using
> sharing-of-complex-immutable-internal-data-structure-nodes as an
> optimization technique.

No way. :-)

Also (from the traveling salesman solution):

 (c) to see if they would ever even think of reinventing closures.

For people who just want a quick look, here is the OCaml code for the core 
function:

  let rec nth_nn =
    let memory = Hashtbl.create 1 in
    fun n (i, io) ->
      try Hashtbl.find memory (n, i)
      with Not_found -> match n with
        0 -> AtomSet.singleton (i, io)
      | 1 ->
          let nn = bonds.(i - 1) in
          if io = zero then nn else
          let aux (j, jo) s = AtomSet.add (j, add_i io jo) s in
          AtomSet.fold aux nn AtomSet.empty
      | n ->
          let pprev = nth_nn (n-2) (i, io) in
          let prev = nth_nn (n-1) (i, io) in
          let aux j t = AtomSet.union (nth_nn 1 j) t in
          let t = AtomSet.fold aux prev AtomSet.empty in
          let t = AtomSet.diff (AtomSet.diff t prev) pprev in
          Hashtbl.add memory (n, i) t;
          t in

$ time ./nth 30 1 <cfg-diamond >tmp

real    0m4.167s
user    0m3.940s
sys     0m0.000s

$ time ./nth.opt 30 1 <cfg-diamond >tmp

real    0m0.492s
user    0m0.450s
sys     0m0.000s

Here's the elegant C++ equivalent:

const AtomSet nth_nn(int n, int i, const vector<int> io) {
  const Map::key_type k=make_pair(n, make_pair(i, io));
  Map::const_iterator m=memory.find(k);
  if (m != memory.end()) return m->second;
  AtomSet s;
  if (n == 0) {
    s.insert(make_pair(i, io));
    return s;
  } else
    if (n == 1) {
      const AtomSet &nn=bonds[i-1];
      for (AtomSet::const_iterator it=nn.begin();
    it != nn.end(); it++) {
 int j = it->first;
 vector<int> jo = it->second;
 for (i=0; i<d; i++)
   jo[i] += io[i];
 s.insert(make_pair(j, jo));
      }
      return s;
    } else {
      const AtomSet
 &pprev = nth_nn(n-2, i, io),
 &prev = nth_nn(n-1, i, io);
      for (AtomSet::const_iterator it=prev.begin();
    it != prev.end(); it++) {
 const AtomSet &t = nth_nn(1, it->first, it->second);
 AtomSet t2;
 set_union(s.begin(), s.end(),
    t.begin(), t.end(),
    inserter(t2, t2.begin()));
 s=t2;
      }
      AtomSet s2;
      set_difference(s.begin(), s.end(),
       prev.begin(), prev.end(),
       inserter(s2, s2.begin()));
      s=AtomSet();
      set_difference(s2.begin(), s2.end(),
       pprev.begin(), pprev.end(),
       inserter(s, s.begin()));
    }
  memory[k] = s;
  return memory.find(k)->second;
}

With this C++, I get:

$ time ./nth 30 1 <cfg-diamond >tmp

real    1m31.178s
user    1m12.680s
sys     0m0.100s

With my sneaky optimisation:

$ time ./nth 30 1 <cfg-diamond >tmp

real    0m0.479s
user    0m0.480s
sys     0m0.000s

I've just noticed that the timing results are quite interesting functions of 
"n". I think I'll plot them...

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://ffconsultancy.com