Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
Seeking feedback on Xmlm
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2008-01-18 (22:54)
From: Bünzli_Daniel <daniel.buenzli@e...>
Subject: Seeking feedback on Xmlm

I plan to make some changes to Xmlm. While it has been downloaded  
approximatively a hundred time I never got any feedback on it (bots  
only ?). If anybody found any issues I'd be happy to know about them  

I would also gladly take comments on the following points.

I plan to remove the persistent cursor and the tree representation ('a  
Xmlm.tree and 'a Xmlm.cursor). From my experience 'a Xmlm.tree is  
awkward to pattern match on and more than once I found it much cleaner  
to input documents with the sequential interface into a custom data  
structure corresponding to the document's semantics. Besides while I  
really see the point of the zipper in the context in which it was  
invented (ui for a structured text editor) I wasn't convinced by the  
use of the cursor to make "batch" tree processing hence I don't think  
it has its place at Xmlm's level. Finally using the sequential  
interface to input/output a custom tree representation is only a few  
lines of code and provided in the documentation's sample code [1]. By  
removing these types I hope Xmlm's users won't waste their time to  
reach to the same conclusions.

Instead of throwing an exception the input function will return a  
value of type [ `Value of 'a | `Error of (int * int) * error ]. Since  
this will break backward compatibility, I'll take the opportunity to  
also change the 'error' type from a variant to a polymorphic one, for  
syntactic convenience.

I will implement better xml namespace support. Currently Xmlm parses  
qualified names however the client has to maintain its own structure  
during parsing to know in which namespace he is. The idea is to add  
the boolean label ?expand_names to the input function. When set to  
true, you get expanded names instead of qualified names (an expanded  
name is a couple (uri, name) where uri is the namespace uri).  The  
output_of_* functions will have a label ~expanded_names to indicate  
that we will pass expanded names and xmlm will automatically take care  
of the rest (though to have pretty prefixes you'll have to process  
your names manually).

As an external add-on I commited the file test/ containing a  
mapping from xhtml 1.1 entities to their corresponding utf-8 character  
sequence. This can be used to construct a function to resolve xhtml  
entities and hence get xhtml parsing at your fingertips.

This will be all for this new version. I hope this is the last time I  
break backward compatibility.