Version française
Home     About     Download     Resources     Contact us    
Browse thread
Re: [Caml-list] Substring search on an array of strings
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Shawn Wagner <shawnw@s...>
Subject: Re: [Caml-list] Substring search on an array of strings
On Fri, Jun 25, 2004 at 07:05:13PM +0200, Diego Olivier Fernandez Pons wrote:
>     Bonjour,
> 
> > If you need to find every substring in every element of the array
> > then, yes, you'll have to look at every element of the array to see
> > if that substring appears or not in it.
> 
> No, or at least only in the worst case.
> 
> In the same way the Knuth-Morris-Pratt suffix automaton allows you not
> to search for every position in a string if it is not necessary, a
> specific procedure for searching in an array of strings could avoid
> you searching on every string.
> 
> I will try to give some simple examples to get some intuition :
> 
> - Suppose there are many strings that only contains a few different
> letters. Then you could build first a map from subsets of the alphabet
> to string indexes and only search in those strings

And, of course, to build that map you have to look at every element of the
array. Of course, you can then re-use the map, but it has to be generated
somehow.

> 
> - Suppose that the strings have a particular pattern (say
> "processortype.memory.operatingsystem"). Then you could build indexes
> on every processor type, memory or operating system, and search only
> in the corresponding strings
> 

You still have to look at the strings to find out where those corresponding
strings are. Now, with something like your second example you can sort the
array (Once again having to look at everything in the array) and use a
binary search to find the range of strings starting with, say, athlon,
without having to look at every element in that particular search.

Your original questions didn't say anything about such extra information or
patterns in the strings you're searching. If you want an answer you like,
you have to give us enough data about the problem that we have a hope of
coming up with one! Don't say: How do I find every substring in an array of
strings. Say: In an array of strings of the format
'processortype.memory.operatingsystem', how can I find all strings that have
'linux' in the operatingsystem field?


Aside: Why did you cc'ing caml-list when I only replied to you in that last
email? (This one is going to caml-list too)

-- 
Shawn Wagner
shawnw@speakeasy.org

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners