Browse thread
Fast XML parser
-
Luca de Alfaro
-
Gabriel Kerneis
-
Till Varoquaux
- Gabriel Kerneis
-
Till Varoquaux
- Richard Jones
- Jon Harrop
-
Gabriel Kerneis
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Gabriel Kerneis <gabriel.kerneis@e...> |
| Subject: | Re: [Caml-list] Fast XML parser |
--Sig_qrmakQdiK951v60LQ2iIv+s Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Le Thu, 19 Jul 2007 00:48:07 +0200, "Till Varoquaux" <till.varoquaux@gmail.com> a =E9crit : > Ouch, >=20 > I beg to differ, if you want speed and can work stream (linear > top-down left-right exploration of the graph), you want an event based > xml parser. expat is probably one of the fastest (the c library is > known to be a speed demon). PXP does everything including talking > klingon and controlling the kitchen sink. It provides an event based > layer. I certainly wouldn't recommend xml-light for *every* project where an XML parser is needed, but look at the OP's requirements : > > > I am interested in parsing Wiki markup language that has a few > > > tags, like <pre>...</pre>, <math>...,</math>. > > > These tags are sparse, meaning that the ratio of number of tags / > > > number of bytes is low. On such a simple case, xml-light (which is basically a simple ocamllex file + a few things to build the syntax tree) should perform quite well. I know it doesn't handle DTD, etc. but in *that* case, who cares ? > Ultimately if you are parsing very simple files and are aiming for > pure speed you could write a simple lexer with ocamllex and use that > as base layer. That could be a solution, and (provided the licence you chose for your project is compatible) you could even use xml-light as an example to begin with (stripping things you don't need). Kind regards, --=20 Gabriel --Sig_qrmakQdiK951v60LQ2iIv+s Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGnwOb6a2JmXQu5bYRAr2XAKCAX4U+5tZ4quT+v6tQu7/FzXAHzgCgxEtk N0sHIvkMQU8F957AOkLkeJE= =/szn -----END PGP SIGNATURE----- --Sig_qrmakQdiK951v60LQ2iIv+s--