Version française
Home     About     Download     Resources     Contact us    
Browse thread
speeding up matrix multiplication (newbie question)
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Martin Jambon <martin.jambon@e...>
Subject: Re: [Caml-list] speeding up matrix multiplication (newbie question)
Erick Matsen wrote:
> Wow, once again I am amazed by the vitality of this list. Thank you
> for your suggestions.
> 
> Here is the context: we are interested in calculating the likelihood
> of taxonomic placement of short "metagenomics" sequence fragments from
> unknown organisms in the ocean. We start by assuming a model of
> sequence evolution, which is a reversible Markov chain. The taxonomy
> is represented as a tree, and the sequence information is a collection
> of likelihoods of sequence identities. As we move up the tree, these
> sequences "evolve" by getting multiplied by the exponentiated
> instantaneous Markov matrix.
> 
> The matrices are of the size of the sequence model: 4x4 when looking
> at nucleotides, and 20x20 when looking at proteins.
> 
> The bottleneck is (I mis-spoke before) that we are multiplying many
> length-4 or length-20 vectors by a collection of matrices which
> represent the time evolution of those sequences as follows.
> 
> Outer loop:
>   modify the amount of time each markov process runs
>   exponentiate the rate matrices to get transition matrices
> 
>   Recur over the tree, starting at the leaves:
>     at a node, multiply all of the daughter likelihood vectors together
>     return the multiplication of that product by the trasition matrix
> (bottleneck!)
> 
> The trees are on the order of 50 leaves, and there are about 500
> likelihood vectors done at once.
> 
> All of this gets run on a big cluster of Xeons. It's not worth
> parallelizing because we are running many instances of this process
> already, which fills up the cluster nodes.
> 
> So far I have been doing the simplest thing possible, which is just to
> multiply the matrices out like \sum_j a_ij v_j. Um, this is a bit
> embarassing.
> 
> let mul_vec m v =
>     if Array.length v <> n_cols m then
>       failwith "mul_vec: matrix size and vector size don't match!";
>     let result = Array.create (n_rows m) N.zero in
>     for i=0 to (n_rows m)-1 do
>       for j=0 to (n_cols m)-1 do
> 	result.(i) <- N.add result.(i) (N.mul (get m i j) v.(j))
>       done;
>     done;
>     result
> 
> I have implemented it in a functorial way for flexibility. N is the
> number class. How much improvement might I hope for if I make a
> dedicated float vector multiplication function? I'm sorry, I know
> nothing about "boxing." Where can I read about that?

Depending on the savings, you can afford to spend more or less time optimizing
this. Here are some simple things to consider:

In the OCaml land, try first getting rid of the functor (or use a
defunctorizer; ocamldefun?).

Limit memory accesses, by doing something like:

for i = 0 to m - 1 do
  let a_i = m.(i) in
  for j = 0 to n - 1 do
    let a_ij = a_i.(j) in (* instead of a.(i).(j) *)
    ...
  done
done

Also you can use Array.unsafe_get where it really matters.


You can also use bigarrays and implement the loop in C. It could be fun. I'm
not sure how much it saves.


Martin