English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
[Caml-list] ocaml-3.05: a performance experience
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2002-08-03 (12:34)
From: Gerd Stolpmann <info@g...>
Subject: Re: [Caml-list] ocaml-3.05: a performance experience

On 2002.08.02 05:33 Alexander V. Voinov wrote:
> Hi All,
> I have an application, which parses a huge XML file and stores resulting
> records to a database.
> The file is parsed using PXP, but in a 'pulldom' manner, by extracting
> (to a Buffer) first level tags manually with pcre, then an array insert
> of 30000 recognized and accumulated records is performed. DB access
> takes a small fraction of the run time.
> Compiled with ocaml-3.04 it took 1h40m+-5m of 'user' process time and
> occupied about 340M in RAM. With 3.05 it took 2h40m+-5m and occupied
> 250M. 
> Is this the consequence of the new GC strategy? Actually I'd tolerate
> large footprint for the sake of more speed.
> It's also interesting to note, than in the case of 3.04 the footprint of
> the application starts from 330M and slowly expands to 350M. With 3.05
> it starts with 250M and then almost does not expand till the end.
> Sparc Solaris 2.7, gcc 3.0.4.
> A previous version of this app, written in Python with PyXML, runs 3-4
> times slower than the 3.04 version and takes 20M in RAM.

I think you observe GC compaction. You can turn it off:
OCAMLRUNPARAM="O=1000000" (or Gc.set).

If XML validation is not needed, you could also rewrite your program
to use the new event-based parsing in PXP-1.1.90. That would completely
avoid to represent the XML tree in memory (and increase the speed, because
GC of large memory footprints is expensive).

Gerd Stolpmann      Telefon: +49 6151 997705 (privat)
Viktoriastr. 45             
64293 Darmstadt     EMail:   gerd@gerd-stolpmann.de
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners