Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
Fast XML parser
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Gabriel Kerneis <gabriel.kerneis@e...>
Subject: Re: [Caml-list] Fast XML parser
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Le Thu, 19 Jul 2007 00:48:07 +0200, "Till Varoquaux"
<> a =E9crit :
> Ouch,
> I beg to differ, if you want speed and can work stream (linear
> top-down left-right exploration of the graph), you want an event based
> xml parser. expat is probably one of the fastest (the c library is
> known to be a speed demon). PXP does everything including talking
> klingon and controlling the kitchen sink. It provides an event based
> layer.

I certainly wouldn't recommend xml-light for *every* project where an
XML parser is needed, but look at the OP's requirements :
> > > I am interested in parsing Wiki markup language that has a few
> > > tags, like <pre>...</pre>, <math>...,</math>.
> > > These tags are sparse, meaning that the ratio of number of tags /
> > > number of bytes is low.
On such a simple case, xml-light (which is basically a simple ocamllex
file + a few things to build the syntax tree) should perform quite
well. I know it doesn't handle DTD, etc. but in *that* case, who cares ?

> Ultimately if you are parsing very simple files and are aiming for
> pure speed you could write a simple lexer with ocamllex and use that
> as base layer.

That could be a solution, and (provided the licence you chose for your
project is compatible) you could even use xml-light as an example to
begin with (stripping things you don't need).

Kind regards,

Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

Version: GnuPG v1.4.6 (GNU/Linux)