Browse thread
[Caml-list] ocaml-3.05: a performance experience
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: | 2002-08-04 (20:45) |
From: | Gerd Stolpmann <info@g...> |
Subject: | Re: [Caml-list] ocaml-3.05: a performance experience |
On 2002.08.04 04:50 Alexander V. Voinov wrote: > Hi Gerd, > > Gerd Stolpmann wrote: > > If XML validation is not needed, you could also rewrite your program > > to use the new event-based parsing in PXP-1.1.90. That would completely > > avoid to represent the XML tree in memory (and increase the speed, because > > GC of large memory footprints is expensive). > > thanks again, but it's not yet officially announced, is it? I managed to > download it, but I didn't find any direct link. Also, it this parsing > mode mentioned in the manual? It is experimental code, but event-based parsing will definitely remain in the parser until the next stable release. Details of the interface may change, however. (I call a release "stable" when the interface has matured, and when all regression tests have passed. The experimental releases usually work, but it is more likely that there is some "overlooked case" in the code.) The manual is not yet updated, there is only a description in the mli file, and a small example. In particular, there is type type event = | E_start_doc of (string * bool * dtd) | E_end_doc | E_start_tag of (string * (string * string) list * Pxp_lexer_types.entity_id) | E_end_tag of (string * Pxp_lexer_types.entity_id) | E_char_data of string | E_pinstr of (string * string) | E_comment of string | E_position of (string * int * int) | E_error of exn | E_end_of_stream and a function is called back for every of these events. For example, for <A x="1">Q<B>R</B>S</A> you would get the events E_start_doc("1.0",false,dtd) E_start_tag("A", ["x", "1"], ent_a) E_char_data "Q" E_start_tag("B", [], ent_b) E_char_data "R" E_end_tag("B", ent_b) E_char_data "S" E_end_tag("A", ent_a) It is already checked that the document is well-formed, so for end E_end_tag there is always a matching E_start_tag. Because the parser "pushes" the events to the application, this is a so-called "push parser". There are plans for a "pull parser", too (the application calls a next_event function to get the events), as this would allow to create streams of XML events. Gerd -- ---------------------------------------------------------------------------- Gerd Stolpmann Telefon: +49 6151 997705 (privat) Viktoriastr. 45 64293 Darmstadt EMail: gerd@gerd-stolpmann.de Germany ---------------------------------------------------------------------------- ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners