Version française
Home     About     Download     Resources     Contact us    
Browse thread
[OSR] Suggested topic - XML processing API
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Alain Frisch <alain@f...>
Subject: Re: [Caml-list] [OSR] Suggested topic - XML processing API
Jim Miller wrote:
> type xmlNode =
>  | XmlElement of (namespace: string * tagName: string * attributes:
> (string * string) list * (children:xmlNode list) )
>  | XmlPCData of (text:string)

There has been some discussions here a while ago about standardizing XML 
types across OCaml libraries. You might want to look up the archives.

Here are some random remarks.

First, you need to specify several things in the type above.

- the encoding of strings; if the parser cannot be configured, I guess 
that normalizing everything to utf-8 is the most natural choice.

- the handling of namespaces; does the first argument to XmlElement 
refers to the namespace prefix as used in the document (it'd make 
matching impossible because the document can use arbitrary prefixes), a 
normalized version (you'd need to provide the parser with more info), or 
the namespace URI (which makes pattern matching quite tedious). Also, it 
is sometimes necessary to keep the [prefix->uri] dictionnary available 
in at every node (e.g. to deal with XML Schema documents, where prefixes 
can be used in attribute values). Moreover, some XML documents may be 
valid w.r.t. to the XML spec without conforming to the XML Namespaces one.

- whether adjacent XmlPCData nodes are allowed or not.

- whether the parser performs whitespace normalization (and how).


Also, in many cases, the client of the parser might want to get more 
information, like locations in the source document.

If you intend to use the same type to produce XML documents from an 
internal representation, I think you might want to add an extra constructor:

   | XmlMany of xmlNode list

This makes it much easier to build and compose XML fragments in a 
modular way.

Also, you need to specify how the XML printer is supposed to deal with 
namespaces.



-- Alain