Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
xmlm and names(paces)
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2008-02-06 (21:58)
From: Alain Frisch <alain@f...>
Subject: Re: [Caml-list] xmlm and names(paces)
Bünzli Daniel wrote:
> As I previously said on this list I'm adding better namespace support to 
> xmlm. Up to now xmlm just parsed qualified names into their prefix and 
> local part (prefix, local). Now I'd like to provide the client with 
> expanded names (uri, local).
> Initially I planned to give the client choice between getting qualified 
> names or expanded names. However the prefix of qualified names is really 
> meaningless (it can be alpha converted) and thus cannot be used to 
> recognize anything in a document. One of the aim of xmlm is simplicity, 
> as such I think xmlm should only provide expanded names.

The problem with expanded names is that it makes it quite tedious to 
pattern-match on element/attribute names (uri are long!). Of course, it 
is a trivial exercise in Camlp4 to create a nice syntax for that.

Another option is to let the client provide a mapping from uri to fixed 
prefixes. (PXP can do that kind of prefix normalization.)

It is also a good idea to be able to parse XML documents that conform
to the XML spec but not the XML Namespaces spec.

What about something like that:

type name = string * [`N of string * string|`U of string * string|`X]

The first component of name gives the full unparsed name from the XML 
document. The second component gives the qname decomposition; it can be 
either a known normalized prefix (relative to a dictionnary provided by 
the client) or an unknown URI. Or it can be an error (the document does 
not conform to the XML Namespaces spec). If the client does not provide
any dictionnary of known prefixes, there will be no `N node. If the 
parser is run a non-namespace mode, there will be only `X nodes.

   ("html:p", `N ("xhtml", "p"))
         the prefix html refers to the known xhtml namespace

   ("foo:x", `U ("http://unknownnamespaceuri", "x"))
         the prefix foo refers to a unknown uri

   ("x:y", `X)
         the prefix x is not bound to any namespace

   ("x::z", `X)
         name is ill-formed w.r.t. the XML Namespaces spec.

Also, it is necessary to give the client a way to know the namespace 
bindings in scope at any node. Some XML languages like XML-Schema need 
this information. A possible way to do it is just to keep the xmlns 
declarations as regular attributes.

As a minor alternative, in order to reduce the syntactic overhead, it is 
possible to use a single string to encode the three possible cases. E.g.:

   "p:xhtml"  Known uri
   "x::http://unknownnamespaceuri" Unknown uri
   "" Ill-formed qname

-- Alain