Version française
Home     About     Download     Resources     Contact us    
Browse thread
yet another silly question on PXP
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Gerd Stolpmann <gerd@g...>
Subject: Re: :pxp_evpull notation (was: yet another silly question on PXP)
Am Freitag, den 25.02.2005, 19:14 +0300 schrieb Paul Argentoff:
> Dear Gerd Stolpmann,
> 
> Let GS = "Gerd Stolpmann" in
>   written_by GS => 
> 
>  GS> See the file doc/PREPROCESSOR which is part of the distribution
>  GS> tarball.
> 
> Thanks again for a reference. My next question is about :pxp_evpull
> notation. Can I make such a construct:
> 
> let pile = <:pxp_evpull<
>              <foo> (: some_fun () :) >>
> 
> where some_fun generates a further "subtree" using the same pxp_evpull
> notation. 

Yes, this works. some_fun is called when the events for the children of
foo are generated. You must have

some_fun : unit -> Pxp_types.event option

and some_fun is repeatedly called until it returns None.

pxp_evpull generates automata where every state returns an event.
External functions like some_fun are represented as loops, i.e. the next
state is the same state when the function returns Some _, and the
following state for None.

For your example, <:pxp_evpull< <foo> (: some_fun () :) >>, the
automaton is:

let _ =
  let _eid = Pxp_dtd.Entity.create_entity_id () in
  let rec _generator =
    let _state = ref 0 in
    fun _arg ->
      match !_state with
        0 ->
          let ev = Pxp_types.E_start_tag ("foo", [], None, _eid) in
          _state := 1; Some ev
      | 1 ->
          begin match some_fun () _arg with
            None -> _state := 2; _generator _arg
          | Some Pxp_types.E_end_of_stream -> _generator _arg
          | Some ev -> Some ev
          end
      | 2 ->
          let ev = Pxp_types.E_end_tag ("foo", _eid) in _state := 3; Some ev
      | 3 -> None
      | _ -> assert false
  in
  _generator

(output generated with "camlp4 -I ... pa_o.cmo pa_op.cmo pcre.cma
unix.cma netstring.cma pxp_pp.cma pr_o.cmo sample.ml")

some_fun can even be another pxp_evtree automaton.

> My task really is to build a converter from a huge (>100M) text file (or
> string Stream.t) to a huge xml file. Of course, I need to do all job with
> lazy streams to avoid out-of-memory exceptions.

Pull parsers are your friend. They were created with such applications
in mind.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
------------------------------------------------------------