Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] Parse crazy HTML, output XML
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: James Leifer <James.Leifer@i...>
Subject: Re: [Caml-list] Parse crazy HTML, output XML
Alain Frisch <Alain.Frisch@ens.fr> writes:

> On Mon, 21 Jun 2004, Richard Jones wrote:
>
>> The problem is the parsing phase.  Both PXP and XmlLight will only
>> parse valid XML (as far as I can see).  Is there any simple pure OCaml
>> library for parsing HTML and producing a DOM?

If you've got really broken documents then perhaps "tidy" is your
friend. Yes, I know it may be outside the scope of your request
because it's an external program but apparently it can do wonders for
syntactically dubious tag-soup html.  I believe that it can produce
pure xhtml, for example.

-J

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners