Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] ocaml-3.05: a performance experience
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Gerd Stolpmann <info@g...>
Subject: Re: [Caml-list] ocaml-3.05: a performance experience

On 2002.08.04 04:50 Alexander V. Voinov wrote:
> Hi Gerd,
> 
> Gerd Stolpmann wrote:
> > If XML validation is not needed, you could also rewrite your program
> > to use the new event-based parsing in PXP-1.1.90. That would completely
> > avoid to represent the XML tree in memory (and increase the speed, because
> > GC of large memory footprints is expensive).
> 
> thanks again, but it's not yet officially announced, is it? I managed to
> download it, but I didn't find any direct link. Also, it this parsing
> mode mentioned in the manual?

It is experimental code, but event-based parsing will definitely remain
in the parser until the next stable release. Details of the interface may
change, however. (I call a release "stable" when the interface has matured,
and when all regression tests have passed. The experimental releases usually
work, but it is more likely that there is some "overlooked case" in the
code.)

The manual is not yet updated, there is only a description in the mli file,
and a small example. In particular, there is type 

type event =
  | E_start_doc of (string * bool * dtd)
  | E_end_doc
  | E_start_tag of (string * (string * string) list * Pxp_lexer_types.entity_id)
  | E_end_tag   of (string * Pxp_lexer_types.entity_id)
  | E_char_data of  string
  | E_pinstr of (string * string)
  | E_comment of string
  | E_position of (string * int * int)
  | E_error of exn
  | E_end_of_stream

and a function is called back for every of these events. For example, for

<A x="1">Q<B>R</B>S</A>

you would get the events

E_start_doc("1.0",false,dtd)
E_start_tag("A", ["x", "1"], ent_a)
E_char_data "Q"
E_start_tag("B", [], ent_b)
E_char_data "R"
E_end_tag("B", ent_b)
E_char_data "S"
E_end_tag("A", ent_a)

It is already checked that the document is well-formed, so for end E_end_tag
there is always a matching E_start_tag.

Because the parser "pushes" the events to the application, this is a so-called
"push parser". There are plans for a "pull parser", too (the application calls
a next_event function to get the events), as this would allow to create
streams of XML events.

Gerd
-- 
----------------------------------------------------------------------------
Gerd Stolpmann      Telefon: +49 6151 997705 (privat)
Viktoriastr. 45             
64293 Darmstadt     EMail:   gerd@gerd-stolpmann.de
Germany                     
----------------------------------------------------------------------------
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners