Browse thread
mboxlib reloaded ;-)
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Oliver Bandel <oliver@f...> |
| Subject: | Re: [Caml-list] mboxlib reloaded ;-) |
Hi,
only a short note, because I tonight will not explore it in detail...
On Fri, Apr 27, 2007 at 05:29:11PM +0100, Richard Jones wrote:
> On Fri, Apr 27, 2007 at 03:54:25PM +0200, Oliver Bandel wrote:
> > Hello,
> >
> > after two years of doing nothing on it,
> > I today found my mboxlib, I started to
> > write in 2005.
> >
> > I have put the mli-file on the web and
> > maybe the library itself will follow
> > during the next time.
> >
> > Any feedback, questions and suggestions are welcome.
> >
> > http://me.in-berlin.de/~first/software/libraries/mboxlib/
>
> The source for COCANWIKI[1] contains extensive support for threading
> of mail messages, based on JWZ's algorithm:
>
> http://www.jwz.org/doc/threading.html
Nice... you speak of an optimized algorithm for threading.
I didn't explored your solution nor did I explored your
paper in detail (tomorrow I think I have the time to do it),
but IMHO the best thing for handling message-threads
is to use tries-datastructure with messgae-id's
as identifers (instead of char's, as they are used normally).
So: did you reimplemented the tries-datastructure
as abstraction on message ID's, or did you
made it different?
>
> You are of course welcome to copy this. If there are any license
> issues let me know & I can fix them.
>
> I'd also like to point you to another useful JWZ doc:
>
> http://www.jwz.org/doc/mailsum.html
Well, the same here: tomorrow I can look at itin more detail;
but the problem of fast mbox-usage I today also found out as
a problem, as I first time used a test-mbox of about 100 MB.
Normally I would use some MB's of size, because I think
ths is the normal size; but I had some dscussions on the
berlin Linux user group, and some people were anbnoyed that
mutt needs some seconds to read in mbox-files of about
80 MB's.
So, I then checked my mboxlib and saw that it is quite slow,
compared to what I expected ( expect! I did not tried it
on my development machine because I have nomutt installed there)
and even if native-code smuch faster, it's nevertheless slow...
...so I thought I have to redesign my scanner-stage.
(I use Str-module and ocamnllex mixed together; maybe
using a plain selfwritten OCaml-scanner might be better here).
Ciao,
Oliver
P.S.: 12 seconds for 100 MB seems tobe quite slow...
I very often call the lexer, and that might be done
smarter.
Maybe your pages will show some useful attempts.