English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
Which types for representing HTML documents ?
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2009-06-08 (11:41)
From: Martin Jambon <martin.jambon@e...>
Subject: Re: [Caml-list] Which types for representing HTML documents ?
Sébastien Hinderer wrote:
> Dear all,
> According to you, how could an HTML document best be represented in
> OCaml ?

Ocamlnet's Nethtml works fine for parsing:

type document =
    Element of
      (string * (string * string) list *
       Nethtml.document list)
  | Data of string

If your goal is to interpret arbitrary web pages, you have to allow all kinds
of standard or non-standard elements and attributes anywhere in the document.

If you are creating HTML documents, beware that you can't embed Flash objects
using standard HTML.  I'm not even speaking of javascript happily manipulating
the DOM tree with little restrictions.

Personally I use text templates and validate web pages once they are in my
browser (using the shortcut to validator.w3.org that opera provides).  For
javascript-generated nodes, I just check that it works in various browsers
(the firefox "View source chart" extension is useful for debugging the DOM tree).

I do not suffer at all from the absence of static type-checking of the HTML
tree.  I imagine that the reasons for this are:

* HTML is the final product and is trivial to debug (no need to printf
everything since everything is already printed...)

* There are no complicated conditionals that would leave certain parts of the
code untested for a long time.

* Mainstream web browsers are very tolerant.  Small accidental deviations from
the strict W3C standards usually have no visible effect.

> In particular: would you rather use classes or records, polymorphic
> variants or normal constructors ?
> There are attributes which can occur in several elements, such as id,
> class... How shold these be represented ?
> Should the types reflect the differences between inline elements and
> other types of elements ?

I know this is going to annoy a lot of people on that list, but this feels
very academic to me :-)