English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
Re: Why OCaml sucks
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2008-05-12 (13:22)
From: Richard Jones <rich@a...>
Subject: Re: [Caml-list] Re: Why OCaml rocks
FWIW this is an implementation using Ancient:

let n = 1024
let a = Array.make_matrix n n 6.7
let b = Array.make_matrix n n 8.9

(* Result array, stored in shared memory. *)
let c =
  let c = Array.make_matrix n n 0. in
  let fd = Unix.openfile "/tmp/zero" [Unix.O_RDWR;Unix.O_TRUNC;Unix.O_CREAT] 0o644 in
  let md = Ancient.attach fd 0x440000000000n in
  Ancient.follow (Ancient.share md 0 c)

let parmul_aux i0 i1 n a b =
  for i = i0 to i1 - 1 do                                                     
    let ai = a.(i) in                                                         
    for j = 0 to n - 1 do                                                     
      let r = ref 0.0 in                                                      
      for k = 0 to n - 1 do                                                   
        r := !r +. Array.unsafe_get ai k *.
	  Array.unsafe_get (Array.unsafe_get b k) j
      c.(i).(j) <- !r

let parmul n a b =
  (match Unix.fork () with 0 -> parmul_aux 0 (n/2) n a b; exit 0 | _ -> ());
  parmul_aux (n/2) n n a b;
  ignore (Unix.wait ())


parmul n a b

This is just barely faster than Jon's OCaml version using message
passing (12% faster on my test machine[0]).  Which just seems to show
that the overhead of message passing _isn't_ the problem here[1].
Perhaps it's the bounds checking in the assignment back to the matrix?

Anyhow, in real life situations we'd all be using a super-optimized
hand-coded-in-assembly matrix multiplication library (LAPACK?), so
this is all very academic.


[0] Quad core Intel hardware:
model name      : Intel(R) Core(TM)2 Quad  CPU   Q9450  @ 2.66GHz

[1] Creation of the result matrix and copying it to shared memory is
almost instantaneous in my tests.

Richard Jones
Red Hat