Package with multidimensional AND sparse matrices?
[
Home
]
[ Index:
by date

by threads
]
[ Message by date: previous  next ] [ Message in thread: previous  next ] [ Thread: previous  next ]
[ Message by date: previous  next ] [ Message in thread: previous  next ] [ Thread: previous  next ]
Date:  20060704 (17:58) 
From:  Martin Jambon <mjambon@b...> 
Subject:  Re: [Camllist] Package with multidimensional AND sparse matrices? 
On Tue, 4 Jul 2006, Andreas Biegert wrote: > Well, in short: the algorithm computes the optimal multiple alignment > of up to n protein sequences with maximum length L. The > straightforward and theoretically optimal approach to solve this > problem is by dynamic programming, therefore the time and space > complexity is O(L^n). I managed to come up with a pretty good > heuristic that lets me boil down the number of cells that need to be > computed giving me a complexitiy of O(L^2) instead of O(L^n), in fact > I can even manually adjust the number of nonzero entries in the > matrix. In the 2000*2000*2000*2000 matrix only at most 2000 cells will > be nonzero. Furthermore for all nonzero cells M(i,j,k,l) holds > i<j<k<l. The bad news is that performance is important since the > program has to run under 5min. An estimation of the time requirements > given the number of sequences n and masimum lengths L is: > > 24 * L^2 * T_x * 2^(n+1)  1 > with T_x being about 100ns > > For n=4 and L=2000 I get just about 5min. Would it be a bad idea to > use Hashtables all the way instead of matrices? Accessing elements in > a hashtable has O(1), right? Yes, if you create it big enough so that it doesn't get resized. The best is that you look at hashtbl.ml in the standard library to see how it is implemented. But I wouldn't worry too much for now: try Hashtbl and you will quickly see if it's fast enough or not. Martin  Martin Jambon, PhD  http://martin.jambon.free.fr Visit http://wikiomics.org, the Bioinformatics Howto Wiki