Version française
Home     About     Download     Resources     Contact us    
Browse thread
XML library for validating MathML
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Dario Teixeira <darioteixeira@y...>
Subject: Re: [Caml-list] XML library for validating MathML

Well, as it turns out, building a basic "Hello World" in PXP is relatively
simple (I followed the manual which is very helpful in the beginning).
However, though the DTD validation works fine with the simple examples I tried,
it fails for a MathML document.  Note that I am using the DTD as provided
by the W3C, available from here:

When processing the MathML DTD, PXP outputs a few a warnings about entities
declared twice, about names reserved for future extensions, and quite a
lot of warnings about code points that cannot be represented.  I can ignore
those for now.

When it does fail, this is the error produced:

In entity ent-isonum = PUBLIC "-//W3C//ENTITIES Numeric and Special Graphic for MathML 2.0//EN" "isonum.ent", at line 28, position 44:
Called from entity [dtd] = SYSTEM "mathml2.dtd", line 1969, position 0:
ERROR (Well-formedness constraint): The character '&' must be written as '&amp;'

Looking at the "isonum.ent" file (packaged with the W3C zip), these are
the contents of line 28, where the error occurs:

<!ENTITY amp              "&#x26;&#x00026;" ><!--=ampersand -->

Though 0x26 is indeed the codepoint for the ampersand character, I don't
get why it appears twice.  Is this a case of double escaping?  Could this
be the reason PXP chokes?

Any thoughts?

Best regards,
Dario Teixeira

P.S.  This is the programme I used for testing.  Its code is pretty much
      lifted from the PXP manual:

open Pxp_document
open Pxp_yacc

class warner =
        method warn w = print_endline ("WARNING: " ^ w)

let rec print_structure n =
        let ntype = n#node_type
        in match ntype with
                | T_element name ->
                        print_endline ("Element of type " ^ name);
                        let children = n # sub_nodes
                        in List.iter print_structure children
                | T_data ->
                        print_endline "Data"
                | _ ->
                        assert false

let () =
                let config = {default_config with warner = new warner} in
                let doc = parse_document_entity config (from_file "test.xml") default_spec
                in print_structure (doc#root)
                exc -> print_endline (Pxp_types.string_of_exn exc)