Version française
Home     About     Download     Resources     Contact us    
Browse thread
Fast XML parser
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Gabriel Kerneis <gabriel.kerneis@e...>
Subject: Re: [Caml-list] Fast XML parser
--Sig_qrmakQdiK951v60LQ2iIv+s
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Le Thu, 19 Jul 2007 00:48:07 +0200, "Till Varoquaux"
<till.varoquaux@gmail.com> a =E9crit :
> Ouch,
>=20
> I beg to differ, if you want speed and can work stream (linear
> top-down left-right exploration of the graph), you want an event based
> xml parser. expat is probably one of the fastest (the c library is
> known to be a speed demon). PXP does everything including talking
> klingon and controlling the kitchen sink. It provides an event based
> layer.

I certainly wouldn't recommend xml-light for *every* project where an
XML parser is needed, but look at the OP's requirements :
> > > I am interested in parsing Wiki markup language that has a few
> > > tags, like <pre>...</pre>, <math>...,</math>.
> > > These tags are sparse, meaning that the ratio of number of tags /
> > > number of bytes is low.
On such a simple case, xml-light (which is basically a simple ocamllex
file + a few things to build the syntax tree) should perform quite
well. I know it doesn't handle DTD, etc. but in *that* case, who cares ?

> Ultimately if you are parsing very simple files and are aiming for
> pure speed you could write a simple lexer with ocamllex and use that
> as base layer.

That could be a solution, and (provided the licence you chose for your
project is compatible) you could even use xml-light as an example to
begin with (stripping things you don't need).

Kind regards,
--=20
Gabriel

--Sig_qrmakQdiK951v60LQ2iIv+s
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGnwOb6a2JmXQu5bYRAr2XAKCAX4U+5tZ4quT+v6tQu7/FzXAHzgCgxEtk
N0sHIvkMQU8F957AOkLkeJE=
=/szn
-----END PGP SIGNATURE-----

--Sig_qrmakQdiK951v60LQ2iIv+s--