Version française
Home     About     Download     Resources     Contact us    
Browse thread
[OSR] Suggested topic - XML processing API
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Bünzli_Daniel <daniel.buenzli@e...>
Subject: Re: [Caml-list] [OSR] Suggested topic - XML processing API
Le 5 févr. 08 à 06:02, Alain Frisch a écrit :

> As suggested before, you really need to say something, at least,  
> about:
[...]
- Whether character references and predefined entity references must  
be resolved. Hint : yes.


Le 5 févr. 08 à 06:02, Alain Frisch a écrit :

> - having a common spec for several libs makes more sense if they can  
> share common types; maybe you should use polymorphic variants  
> instead of regular ones?

Agreed. In xmlm these variants become polymorphic in the next version.

Other comments.

* IMHO, do not use camel casing. Underscores are more caml like, i.e.  
xml_node, etc.
* Regarding naming I would call xmlNode xml_tree and in general drop  
the xml prefix from the cases.
* "combine" argument, in my opinion parser should always combine  
adjacent pcdata nodes.
* As other may now know I don't like to raise exceptions, the next  
version of xmlm doesn't raise exceptions (but given recent discussions  
it seems others do like exceptions).
* Regarding the way the parser is invoked  I don't like the way it is  
done :

(1) The function "parse", I can only use it with channels this is not  
good (2) Having convenience parse_file is always useless to me since  
it is hard to know the exact kind of error handling performed by such  
functions without looking at its source.

The way I do this kind of things is to define an input abstraction  
type. First you create an input abstraction from a data source
(e.g. in_channel, strings, and a callback source) and then you invoke  
the parser with the input abstraction (actually I started an OSR on  
devising IO modules with non object-oriented IO sources and  
destination reflecting this view, but I'm reluctant to publish it).

In general I'd like to say that I'm a little bit dubious about this  
effort. Actually I would refrain from formalizing the actual way the  
parser is invoked, clients can also perform their bit of work. I would  
concentrate on defining :

1) Parsing _result_ types and a precise definition of the actual  
_form_ of the data they contain. More than one form may be defined.  
This is the most important thing if you would like to be able to  
switch implementation, the actual input procedure can easily be  
isolated from the rest of your source.

2) A minimal list of input sources (e.g. in_channel and string) from  
which the parser should be able to read without going in further  
details on how the actual input procedure should be performed. Just  
specify the state in which sources are accepted for input and left  
after output.

Best,

Daniel