Version française
Home     About     Download     Resources     Contact us    
Browse thread
[OSR] Suggested topic - XML processing API
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Jim Miller <gordon.j.miller@g...>
Subject: Re: [Caml-list] [OSR] Suggested topic - XML processing API
> As such I'm not sure such an interface is really feasible. Now if you
> see a common pattern or concrete type signatures that could be changed
> to make parsers more compatible do not hesitate to communicate them.
> If it benefits the users of my parser and remains in its philosophy
> I'll happily implement them. But _you_ have to make concrete proposal,
> I'm not going to research this. Please do not just initiate a
> discussion because you like the abstract idea of being able to swap
> xml parser implementations, make proposals.
> Best,

Fair enough, I'll start with a proposal on the topic, though being
late I'm not going to go too deep.  If it gets through a first round
of discussion, I'll start a node on the wiki and be happy to take
point on maintaining a document based on feedback.

My interest in this is based on my experience in dealing with XML.
90% of what I need to do is parse simple documents defining a known
structure that are coming from either files, strings, or the network.
Its also based on the responses I've received when attempting to
evangelize OCaml to a crowd whose first task is typically to try and
connect to the network, read some XML, do some processing on the XML,
and generate a response.

The purpose of this minimum implementation is to provide a common API
to perform the following tasks:

- Define a simple type that can be used to construct a tree
representing XML data.
- Parse an existing XML document into a simple data structure allowing
access to the data
- Manipulate the result of parsing the XML document
- Construct simple XML documents

XML parser implementations are free to expand beyond this
implementation, this is merely a recommendation for a minimum

type xmlNode =
 | XmlElement of (namespace: string * tagName: string * attributes:
(string * string) list * (children:xmlNode list) )
 | XmlPCData of (text:string)

with the following functions to parse data from different types of
sources.  The parsing, by default, should be non-validating but will
ensure well-formedness

val parse_file : string -> xmlNode
val parse_string: string -> xmlNode
val parse_channel: Pervasives.in_channel -> xmlNode

val to_string : xmlNode -> string

val iter : (xmlNode -> unit) -> xmlNode -> unit
val map : (xmlNode -> 'a) -> xmlNode -> 'a list
val fold : ('a -> xmlNode -> 'a) -> 'a -> xmlNode -> 'a

Additional sections/areas that would have to be defined:

o Handling errors while parsing

o Validation.

I personally prefer having a different set of methods that perform the
parsing with validation so that its obvious to me what is being
performed when I invoke a function.  I would be content with optional
arguments to the parse_ functions that are defined above but with the
default being to not validate.

o Callback/SAX style API

This is where I believe significant differences exist between XML
implementations.  I'm sure that the most that can be done here will be
to standardize the names of the functions or types that are used.