Browse thread
[Caml-list] Parse crazy HTML, output XML
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | James Leifer <James.Leifer@i...> |
| Subject: | Re: [Caml-list] Parse crazy HTML, output XML |
Alain Frisch <Alain.Frisch@ens.fr> writes: > On Mon, 21 Jun 2004, Richard Jones wrote: > >> The problem is the parsing phase. Both PXP and XmlLight will only >> parse valid XML (as far as I can see). Is there any simple pure OCaml >> library for parsing HTML and producing a DOM? If you've got really broken documents then perhaps "tidy" is your friend. Yes, I know it may be outside the scope of your request because it's an external program but apparently it can do wonders for syntactically dubious tag-soup html. I believe that it can produce pure xhtml, for example. -J ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners