Version française
Home     About     Download     Resources     Contact us    
Browse thread
mboxlib reloaded ;-)
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: skaller <skaller@u...>
Subject: Re: [Caml-list] mboxlib reloaded ;-)
On Sat, 2007-04-28 at 13:44 +0200, Oliver Bandel wrote:
> On Sat, Apr 28, 2007 at 12:54:53PM +0200, Gabriel Kerneis wrote:
> > Le Sat, 28 Apr 2007 12:47:47 +0200, Oliver Bandel
> > <oliver@first.in-berlin.de> a écrit :
> > > > You should check the size (number of states) of the generated
> > > > lexer.
> > > 
> > > How?
> > 
> > It's printed out by ocamllex when you run it on you .mll file.
> > Regards,
> 
> Ah, ok. :)
> 
> 
> 18 states, 261 transitions, table size 1152 bytes.
> 
> Does not loooks very huge ;-)

Lol, no it is tiny. You are probably right, too many calls,
and too much copying data around. AFAIK Ocaml channels also
add an extra buffer layer (is that right?) so there's even
more copying.

Still, although Ocaml may generate more code than C,
if your code is reasonably tight it should be cached
and be fast: function calls are actually quite cheap.

Here's an idea: you said:

"For the about 100MB mbox there are 2.5 * 10^6 calls to
to Buffer.add_string for the header and 1.6 * 10^6 calls
to Buffer.add_string for the body, 2.6*10^6 calls to the
function lexing.engine, ..."

How about NOT storing the body text. Instead, just store
the integer file offset of the first byte and the length?
Not sure what you application is doing;
perhaps that would work for you?

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net