Version française
Home     About     Download     Resources     Contact us    
Browse thread
Package with multidimensional AND sparse matrices?
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Martin Jambon <mjambon@b...>
Subject: Re: [Caml-list] Package with multidimensional AND sparse matrices?
On Tue, 4 Jul 2006, Andreas Biegert wrote:

> Well, in short: the algorithm computes the optimal multiple alignment
> of up to n protein sequences with maximum length L. The
> straightforward and theoretically optimal approach to solve this
> problem is by dynamic programming, therefore the time and space
> complexity is O(L^n). I managed to come up with a pretty good
> heuristic that lets me boil down the number of cells that need to be
> computed giving me a complexitiy of O(L^2) instead of O(L^n), in fact
> I can even manually adjust the number of nonzero entries in the
> matrix. In the 2000*2000*2000*2000 matrix only at most 2000 cells will
> be non-zero. Furthermore for all non-zero cells M(i,j,k,l) holds
> i<j<k<l. The bad news is that performance is important since the
> program has to run under 5min. An estimation of the time requirements
> given the number of sequences n and masimum lengths L is:
>
> 24 * L^2 * T_x * 2^(n+1) - 1
> with T_x being about 100ns
>
> For n=4 and L=2000 I get just about 5min. Would it be a bad idea to
> use Hashtables all the way instead of matrices? Accessing elements in
> a hashtable has O(1), right?

Yes, if you create it big enough so that it doesn't get resized.
The best is that you look at hashtbl.ml in the standard library to see 
how it is implemented.

But I wouldn't worry too much for now: try Hashtbl and you will quickly 
see if it's fast enough or not.


Martin

--
Martin Jambon, PhD - http://martin.jambon.free.fr
Visit http://wikiomics.org, the Bioinformatics Howto Wiki