Version franēaise
Home     About     Download     Resources     Contact us    
Browse thread
Now it's faster (addendum to "Performance-question")
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Oliver Bandel <oliver@f...>
Subject: Re: [Caml-list] Now it's faster (addendum to "Performance-question")
Hello Vincent,

Zitat von Vincent Hanquez <tab@snarc.org>:

> On Wed, Feb 06, 2008 at 12:55:04PM +0100, Oliver Bandel wrote:
> > Hello,
> >
> > I should have changed the Subject to: "Shocking Performance!!!"
> >
> > but then possibly the spam-filter would become active ;-)
> >
> >
> > The performance dramatically increased now!
> >
> > I first had about 3min34  on my dataset.
> > After throwing out some of the "^"-using
> > functions, the time was about 1min55.
> >
> > Now, after I threw out the rest of that "^"-stuff
> > (which btw. made more of the catanations then
> > the first thrown out functions, but was not called
> > as often as trhe other functions) I'm under 20 seconds!
> > (17..18 seconds!)
> >
> > That's amazing! :-)
>
>
> well i'm pretty sure you could go down even further with your own
> implementation of a buffer library.
[...]

Possibly, but I have no reason to start such an implementation,
if the current possibilities are fast enough.
IMHO optimization comes at the end. When things are working
well and fast enough, optimization is wasted time.
If the software needs optimization, it can be done then.

This is from a practical perspective.
The academic perspective might be different.
And when I have some time to do it, I may
change the datastructures again, to be faster and cleaner.
But that would be not really necessary for the program that
was the reason to ask here. It would be fine to do it better,
but also can be used as it is now.


>
> the buffer library is actually pretty bad since it's actually just a
> simple string.

IMHO it's differently, but I didn't looked at the code.


> each time the buffer need to grow, the string is
> reallocated and the previous one is copied to the new string.

Are you talking about Buffer-module or the "^"-operator?

> and you got the 16mb limit (max_string_length) on 32bit.

For me that limit would be ok.
The strings I use are not that big, but bigger than expected.
And there are a lot of strings that I concat'ed.
I think because of that there was so much allocation/deallocation
work to do.
With Buffer-module it was much faster.
And even the current implementation could be done more efficient,
because I use Buffer.create() locally. I could use it module-global
and use Buffer.clear() or Buffer.reset().
But when performance is not an nissue, I chose for the cleaner way,
which means: not even module global stuff, if possible.

In a library the decision would be differently.




>
> if you implement a growing array of fixed sized string (4K for
> example),
> you just don't need to copy data each time your buffer need to grow.
> I
> suspect it might be even faster than the normal buffer in your case
> (lots of data appending), but depends on what you do with your buffer
> afterwards.

I only do use that string to write it to a dbm-database.
I need a certain layout of the strings, because more
than one data-item must be stored for each key.
It's not a complicated format, but the strings must be concated.
I did this with "^" first, because I didn't expected
that the string-stuff needs that much time. I thought my
mathematical operations (statistical things) need most time,
but my expectation was wrong. The calculations were done very fast.
So using Bufeer-module instead of "^" for the concat's
did bring a good performance boost.

Ciao,
   Oliver

P.S.:

===============================================
# Sys.max_string_length;;
- : int = 144115188075855863
#
===============================================