MSXML Conformance Update
August 30, 2000
Overview
Table of Contents |
•Overview |
This article is an update to previous articles by David Brownell on the conformance of the Microsoft XML Parser (MSXML). The July 2000 MSXML 3.0 Beta Release has made a significant improvement in conformance against the OASIS XML conformance test suite.
Besides reporting the latest OASIS conformance results, this article also reports on the compliance of the new Visual Basic SAX interface included in MSXML 3.0. To run compliance testing on this component I developed a brand new test harness in Visual Basic, which is also included with this article.
The Test Suite
The OASIS XML conformance test suite is a published set of tests, collected over time from various sources, which measure the conformance of XML parsers against the W3C XML 1.0 specification. It does not include any tests from Microsoft at this time.
For my test, I downloaded the updated test suite that David Brownell published in February. This updated test suite takes into account the W3C errata for the XML 1.0 specification. I made two modifications to this suite:
- Since MSXML has built in support for the W3C Namespaces in XML specification and there is no way to turn off this support, I
changed those tests that were clearly not namespace compliant, like
valid-sa-012
,o-p04pass1
,o-p05pass1
, ando-p08pass1
. (Because of this, I have also e-mailed the OASIS organization suggesting that they add a categorization for namespace compliance. I hope they do.) - Some entities are missing in David's February version of the test suite. For example,
valid-not-sa-001
contains the a DOCTYPE tag with SYSTEM literal "001.ent", but the entity "001.ent" does not exist. Similarly forvalid-not-sa-003
,valid-ext-sa-003
, andp31pass1.xml
. So I created these missing entities (these entities were empty, and accidentally omitted by David due to packaging errors). - Test "inv-not-sa03" was marked "invalid" when in fact it is not well formed. Errata E34 says for an entity reference that does not occur within the external subset or a parameter entity, the Name given in the entity reference must match that in an entity declaration that does not occur within the external subset or a parameter entity, so I change TYPE attribute on this test to "not-wf".
- David's version changes the xml:lang tests lang01 through lang06 from "error" to "invalid". I changed these to "valid" because Errata E73 (issued since David's article was written) says The XML processor does not deal with the value of xml:lang, it just passes it on to the application. Unfortunately, MSXML recently changed to check for these, because of the OASIS test suite itself, which was the wrong thing to do. MSXML will be changed back in the next release.
See xmlconf-20000821.zip for my updated version of the test suite.
The Test Harness
I used the same ECMAScript test harness that David Brownell published, except for one minor modification. This modification stemmed from the issue of what to do with tests marked "valid" that have no DTD (document type definition) at all. David's test harness treated this issue in a manner contrary to the design of MSXML.
For XML documents that have no DTD at all, MSXML successfully loads the documents,
even
when validateOnParse=true
. The MSXML API designers feel that the API is more
usable this way since you can still load documents that require validation, regardless
of
whether their DTDs are available. This is an API design issue, and I believe it should
not
become a conformance issue. Conformance should be about the parser implementation
of the XML
1.0 specification, not about how that parser is packaged. If you really want to verify
whether a document is validated against a DTD, you need to add the following extra
check
(shown in bold) to your code:
var doc = new ActiveXObject("MSXML2.DOMDocument.3.0"); doc.validateOnParse = true; if (doc.load(test) && doc.doctype != null) { // ok now we know it is valid and there really was a DTD. } else { // either load failed or there was no DTD. }
David's harness does not do this extra check. Hence, it reports a whole bunch of failures against MSXML. I added this extra check to my modified test harness so that it uses MSXML correctly.
MSXML Results
Using the test suite and test harness previously described, the July 2000 MSXML Beta Release achieved the following results.
Mode | Raw Results | Pass Rate |
---|---|---|
Non-Validating | 1016/1071 | 95% |
Validating | 1042/1071 | 97% |
A number of failures are due to the fact that, even though
validateOnParse=false
, MSXML still "processes" the DTDs and
external entities so that it can do entity expansion and report default attributes.
Whether
MSXML loads external entities or not depends on the value of the
resolveExternals
flag. However, setting this flag to false causes even more
failures, because the OASIS tests paradoxically assumes that external DTDs and entities
are
going to be available even in non-validating mode. The OASIS tests should provide
a
"standalone" indication in the test descriptions so that we could use this
information to turn resolveExternals
on or off accordingly.
The only way to fully pass the test suite is to silently ignore problems found in the DTDs. The MSXML team believes (from years of experience with Data Access APIs) that silent failures are generally a bad idea. Clearly, this difference in philosophy needs to be resolved.
Non-validating Mode
To test non-validating mode, you set validateOnParse=false
on the
MSXML2.DOMDocument.3.0
object. (Note that you don't need to specify the
version-dependent ProgID if you install the MSXML Beta Release in replace mode. With
MSXML
installed in replace
mode, you can get identical test results using MSXML.DOMDocument
.)
The following command line produces the full non-validating parser report:
cscript harness.js /parser=MSXML2.DOMDocument.3.0 /defparser=MSXML2.DOMDocument.3.0 /nvreport=msxml3-nv.html /suite=e:\oasis2000\xmlconf\xmlconf.xml
The following table describes in more detail the failures reported by the test harness.
Bucket | Description |
---|---|
Bad Entities (8 Tests) | Several tests define entities that cannot legally be expanded, like
<!ENTITY lt "<"> or entities that simply do not exist
like, <!ENTITY noop SYSTEM "nop.ent"> .MSXML always expands the entities, even though the document instance does not use them at parse time, they could be used at run time when createEntityReference is called. I
believe these tests conflict with a complete DOM implementation. (valid-sa-065,
valid-sa-100, pr-xml-little, pr-xml-utf-16, pr-xml-utf-8, ext01, o-p73pass1,
o-p75pass1). |
Output Tests (3 Tests) | Several output tests fail because the above input tests fail. (valid-sa-065, valid-sa-100, ext01). |
Attribute-Value Normalization (13 Tests) | MSXML 3.0 still does not perform Attribute-Value Normalization as described in the XML specification. (valid-sa-043, valid-sa-058, valid-sa-096, valid-sa-104, valid-sa-108, valid-sa-110, valid-sa-111, not-sa02, not-sa03, not-sa04, notation01, sa02, sa04). |
xml:lang (6 Tests) | These tests fail because I changed tests lang01 through lang06 from "invalid" to "valid" because Errata E73 says The XML processor does not deal with the value of xml:lang, it just passes it on to the application. Unfortunately, MSXML recently changed to check for these, because of the OASIS test suite itself, which was the wrong thing to do. MSXML will be changed back in the next release. |
Doing Validation (25 Tests) | MSXML reports some validity constraints even when validateOnParse=false
is false, like "Parameter entity not defined", or "The replacement text for a parameter
entity must be properly nested with parenthesized groups". (not-wf-not-sa-005,
invalid--001, invalid--002, invalid--003 invalid--004, invalid--005, invalid--006,
inv-dtd01, inv-dtd02, inv-dtd06, el04, el05, id03, id04, id05, attr04, attr08, attr09,
attr10, attr11, attr12, attr13, attr14, attr15, attr16). |
Total 55 failures |
Validating Mode
To test validating mode, you set validateOnParse=true
on the
MSXML2.DOMDocument.3.0
object. (Note that you don't need to specify the
version-dependent ProgID if you install the MSXML Beta Release in replace mode. With
MSXML
installed in replace
mode, you can get identical test results using MSXML.DOMDocument
).
The following command line produces the full validating parser report:
cscript harness.js /parser=MSXML2.DOMDocument.3.0 /defparser=MSXML2.DOMDocument.3.0 /vreport=msxml3-val.html /suite=e:\oasis2000\xmlconf\xmlconf.xml
The following table describes in more detail the failures reported by the test harness.
Bucket | Description |
---|---|
Bad Entities (7 tests) | The same issues as with non-validating mode. |
Output Tests (3 tests) | Several output tests fail because the above input tests fail. (valid-sa-012, valid-sa-065, valid-sa-100). |
Attribute-Value Normalization (13 Tests) | The same issues as with non-validating mode. |
xml:lang (6 Tests) | The same issues as with non-validating mode. |
Total 29 failures |
MSXML Conformance Via SAX
MSXML 3.0 now includes a SAX interface that you can use from Visual Basic applications. So, I ported David's ECMAScript test harness to Visual Basic, where I could test the compliance of this new component.
The MSXML SAX interface only supports non-validating mode at this time. Given that, the Visual Basic harness (VbSaxTest.zip for source code and executable) produces the same kind of report as David's harness.
Mode | Raw Results | Pass Rate |
---|---|---|
Non-Validating | 1035/1071 | 97% |
The following table summarizes the detailed report generated by the test harness:
Bucket | Description |
---|---|
Non-existent entities (2 Tests) | Only one test got bitten by this. (ext01 ). But it gets counted twice
because it causes the corresponding output test to fail also. |
Whitespace Normalization (21 Tests) | This is mostly the end of line handling (converting 0d 0a pairs into a single 0a). |
Doing Validation (5 Tests) | These tests fail because the SAX parser is reporting Validation Constraints when it is running in non-validation mode. These are mostly related to the "Proper Declaration/PE Nesting" validity constraint. |
xml:lang (6 Tests) | The same issues as with non-validating mode. |
Total 30 failures |
Conclusion
MSXML still has some issues to resolve relating to non-existent or malformed unused entities, attribute-value normalization, end-of-line handling, and reporting validity constraints when running in non-validating mode. However, you can see from the following table that MSXML is on a steady march towards 100% compliance.
Version | Non-Validating | Validating |
---|---|---|
MSXML 2.5 | 87% (931/1067) | 83% (895/1067) |
MSXML3 May Tech Preview Release | 88% (941/1067) | 85% (913/1067) |
MSXML3 Beta Release | 95% (1016/1071) for DOM 97% (1035/1071) for SAX |
97% (1042/1071) |