Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
[Caml-list] Parse crazy HTML, output XML
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2004-06-21 (16:18)
From: James Leifer <James.Leifer@i...>
Subject: Re: [Caml-list] Parse crazy HTML, output XML
Alain Frisch <> writes:

> On Mon, 21 Jun 2004, Richard Jones wrote:
>> The problem is the parsing phase.  Both PXP and XmlLight will only
>> parse valid XML (as far as I can see).  Is there any simple pure OCaml
>> library for parsing HTML and producing a DOM?

If you've got really broken documents then perhaps "tidy" is your
friend. Yes, I know it may be outside the scope of your request
because it's an external program but apparently it can do wonders for
syntactically dubious tag-soup html.  I believe that it can produce
pure xhtml, for example.


To unsubscribe, mail Archives:
Bug reports: FAQ:
Beginner's list: