Last Word on Last Call
July 5, 2000
Table of Contents |
•Background |
Schemas are now beyond "last call," soon to go to Candidate Recommendation. GCA?s XML Europe conference presented the opportunity to speak with implementers about how the specification looks as it looms on the horizon. The subsequent publication of the comment archive presented a further opportunity to read the mood of the developer community, and confirmed that there are major points of contention over the draft.
While some implementers outside the W3C process do not see impediments in the last call draft, from my brief survey, these folks tend not to be looking to use the new features of schema or to write the reference implementations. For them, the schema draft represents DTD-in-XML-syntax with datatypes, and if the rest of it has issues, they don?t care. Among the others, who also support the W3C effort, there are those who wish for some changes. The most-desired changes are to reduce the feature set, with an eye toward reducing complexity of design, and to do a major rewrite so that simple things are not rendered hard, and hard things not made harder. Among those who take issue with the draft, the most favored outcome is universally viewed unlikely to happen: release of the data types part of the specification along with a DTD for schemas replicating DTD functionality, and more time to sort the rest of it out.
Background
The opening shots in the campaign for schemas were fired in December 1997. XML 1.0 itself was not yet a Recommendation, but it was already clear to careful observers that XML was going to play a crucial role in database interoperability and e-commerce, as well as the literary and publishing work targeted by most SGML applications. At GCA?s XML/SGML ?97, the draft specification of XML 1.0 was handed out, and François Chahuneau and Henry Thompson each delivered papers on the requirement for schemata that would transcend the limitations of DTDs. The WG was formed shortly thereafter, but didn?t really get down to cases until after the entire W3C XML Activity was reorganized in August 1998. Calls for completion started even before the current WG convened.
Microsoft, with co-authors Henry Thompson of the University of Edinburgh and others from Arbortext, Inso, and DataChannel, had published XML-Data in late 1997. It was submitted as a Note to the W3C on January 5, 1998, in time to present it as a fait accompli to a standing-room only crowd at a GCA XML Conference tutorial in March, 1998. Later, CommerceOne submitted Schema for Object-Oriented XML 2.0 (SOX) as a Note. Between the submission of the first Note and the end of last call comments, barely two years have elapsed. In comparison, Charles Goldfarb cites 17 years for the development of the semantics of DTDs and markup declaration syntax, from 1969 through 1986. Ten years of experience with markup applications made it possible to narrow the scope for instance syntax and grammar, but it has greatly broadened the scope for the corresponding schema.
Consider the task of the current WG: to create a common framework for abstracting all information, regardless of the form (be it narrative, normalized data, objects, or multi-media) and regardless of domain (be it historical, literary, mathematical, financial, or strategic). Wherever you attempt to draw a line and say "schemas don?t have to model this," someone else will come back and say, "no, that information and that model are mission-critical for my implementation."
Here are the milestones leading up to the current draft:
- XML-Data submission -- January 5, 1998
- SOX submission -- September 30, 1998 (updated July 7, 1999)
- Schema requirements -- February 15, 1999
- Draft specifications -- February 25, 2000; December 17, November 5, September 24, May 6 of 1999
- Last-call draft -- April 7, 2000.
The prior release of XML 1.0, the clear signs that XML was taking off for a wide range of applications, and the release of XML-Data formed the parameters within which XML schemas have been created. The pressure was on the WG from day one to produce a spec quickly that met or exceeded the functionality of the Microsoft draft, and to do so in a large committee with representation from all the heavy lifters of the database and structured document world.
The current draft has elicited over 200 comments from individuals around the world. The opening up of the comment archive indicates not only the significance of this specification, but the degree of care and responsibility with which its development has been handled both inside and outside the W3C.
Vox Populi
Based on my casual survey in Paris, and some follow-up work, the community seems to be fairly evenly divided along these lines:
Positive, will implement:
"It was a long time coming, but in my view it looks good. It puts the same representation on XML as I can do in code? It maps to the way object oriented programmers would think, which is good."
Of the positive respondees, one has the caveat that they won't do the heavy lifting, but will wait for reference/open source implementations (parsers) and as a result, they expected schema to have little impact on their resources. While this group reports that customers are asking about schema, they do not expect it to impact product functionality.
Fairly indifferent, doesn't affect development plans:
This is the response of several vendors who are tracking the spec, but who feel that their product is not reliant on schema functionality. They report that their customers are more involved in "real world" issues and have their plate full with plain XML migration. In their view, schema is a refinement important to a hardcore group of XML users, but not essential for the mainstream of their market. This group, of course, is more on the document side than the data side of XML application.
Negative, wish they would fix it before release:
Of the negatives, Praxis has the greatest dependency on the final schema spec, and CTO Matthew Gertner was willing to go on the record. Their application, Schemantix, sucks in schemas and spits out HTML forms for editing conformant XML instances. Obviously, schemas are their life-blood. Right now, they use SOX, the schema language from Commerce One, but they expect to adopt the W3C Recommendation. Gertner has commented formally and informally to the W3C on their view of the complexity and overkill of the draft spec.
The best indication of how the affected developer community feels, of course, is contained in the comments on the April draft, although this archive necessarily reflects on that segment that would like to see changes made. Among those, the sentiments expressed indicate that a significant segment of the community dedicated to open standards wants W3C XML schema to succeed, but would like to turn back the clock and take one more shot at a cleaner design and exposition.
The over-arching issues appear to be ones of design complexity and obtuseness of expression, with some related issues of coordination among W3C WGs. At the same time, everyone understands not only the import of the spec, but the enormous pressure to produce a stable Recommendation with dispatch. If I could read between the lines of so many comments, it would say that schema is much too important to do wrong. While they want to see the work completed, they are "most anxious to see this fly." Everybody appreciates the difficulty of holding down the feature set when so many competing and overlapping interests are at stake. Yet there is a strong desire to see the WG observe the medical ethic of "first, do no harm"--meaning prune all controversial aspects of Schema 1.0 that are not essential to the most elementary requirements (to many, this would look like data types and XML syntax).
Complexity
The most extensive description of the complexity issue is found in the W3C last-call comment archive itself, but there are several aspects that surface repeatedly there and in other comments. Among the issues raised are locally scoped namespaces, where multiple implementations are possible, and equivalence classes, where no added functionality above inheritance is seen, certainly not any that merits the complexity.
Philip Wadler of Bell Labs touches on several areas of potential simplification in his comments, which he prefaces in this manner:
The current Schema proposal is complex. Programmers have shown a remarkable ability to put up with complexity, but we do not yet know whether the XML community will be so forgiving. We would like to suggest that it is possible to greatly simplify XML Schema, while not unduly limiting its power. Indeed, some of the suggestions below would both simplify Schema and extend its power at the same time.
Coordination
There seem to be some issues with coordination between the various WGs in the XML Activity, coordination with Query being primary. Wadler notes in his comment that the schema draft appears to be anticipating what Query will do, when it is not at all clear that Query will take the indicated path. He recommends dropping the issues from the draft until the efforts can be coordinated.
If the schema WG feels that it cannot on its own delay the release of the CR, the coordination issues may overtake it later rather than sooner.
Density of Expression
Martin Duerst of W3C says simply, in formal comments, "The verbal complexity of the XML Schema specs, in particular part 1, is extremely high." This, I?m afraid, is an understatement. This draft spec is a sitting duck for those who would take pot shots at the art of stringing together words into meaning.
From section 2.2, the abstract data model:
[Definition:] Several kinds of component have a target namespace, which is either absent or a namespace URI, also as defined by [XML-Namespaces]. The target namespace serves to identify the namespace within which the association between the component and its name exists. In the case of declarations, this in turn determines the namespace URI of, for example, the element information items it may validate.
There are no big words or fancy concepts here, but I could read it 30 times and still not know what it is trying to say. This is what I think it means:
Components may have a target namespace (TN). Like any namespace, a target namespace
- associates a component with a name.
- may or may not have a URI.
- can validate declared items.
I probably missed a great deal in translation, but what I?m left with is a clear statement that a target namespace is a namespace. Either the essential piece that defines target namespace is buried deeper than I could dig, or there is no essential piece here.
I stress the density of the writing because I know from painful, first-hand experience how hard it is to write about the domain that is closest to you. Anyone who has heard Henry Thompson speak knows that he communicates clearly, cleanly, and concisely. If the schema spec can be rewritten to eliminate the unnecessary fuzz, we may just find out that it is not such a tall order as it appears from within the clouds of formless prose.
Where Should Schemas Go?
Here is where the developer community seems to be thinking this could and should go from the last call point onward:
Plausible, and not unthinkable, outcome:
- issue CR pretty much as current, but possibly/hopefully taking out some of the complexity
- get feedback from CR that will confirm
- it solves some problems
- it would solve more problems if rewritten
- would be better yet if narrower in scope, simpler and re-written
- problems fixed before Rec is issued, meaning major changes after CR
Deemed less likely, but highly desirable outcome:
- release CR data type spec first, with XML syntax for direct translation of DTD functionality, which rapidly becomes a Recommendation
- take time to look at all options for extending schema, including use of Extensibility?s schema extensions and cooperation with RELAX
Worst nightmare:
- plow ahead with CR
- don?t receive intensive implementation and/or don?t heed it
- continue to Rec and then find implementers (vs. programmers and computer scientists) can?t read it and implement it
- Rec causes more problems than it fixes: deep and widespread disagreement on implementation strategies that leads to fragmentation of the schema languages supported.
There is one way forward with support both from schema detractors and supporters alike: issue the datatypes spec, and a DTD for schemas in XML syntax that duplicates DTD functionality. Roger Costello of MITRE Corporation stated the case this way in his public comments:
Simplify the schema by making it open and moving the more complex features to a non-binding portion of the schema spec. The resulting simplified version of the XML Schema spec can then gradually evolve to incorporate the more complex features (if the market dictates).
Relaxation Effect
Right now, W3C schemas and the schema draft enjoy universal support over any other schema proposal. In fact, it is a mark of the universal respect with which the editors and co-chairs are held that virtually no one with whom I spoke indicated any desire to replace the process with another, or to de-bunk the assumption that the W3C will produce the authoritative specification.
While no one we spoke with has made a public commitment to implementing Murata Makoto?s RELAX in place of W3C Schema, the relative simplicity of design and expression in the announcement made three months ago was a wake-up call for all involved. Not only is the RELAX spec brief, but it is perceived as powerful, and is on track for ISO standardization. The RELAX Core was published in May 2000 as JIS/TR X 0029:2000, and Murata expects a formal submission to ISO SC34 on behalf of JIS this September. Meanwhile, he is cutting back even further on the RELAX core, dropping, to his regret, some "interesting features," so that the base spec will be stable at the time of submission.
While there are no public statements from non-Japanese companies on RELAX adoption, there is credible talk that it is being looked at very seriously in a number of major technology houses in parallel with schema.
I asked several implementers about the impact of a multiple-schema-language world, and the consensus seemed to be that it would add a level of abstraction and complexity to implementation, but would present no insurmountable barrier. A single schema language world is preferable, as it will drive industry growth. Without a clear, unified direction on schemas, there is some fear that some will take a wait-and-see position and slow down implementation.
Last Call on Schema Last Call
Sometimes heat and pressure makes a substance harder, and sometimes they make it mushy. The schema draft has been produced within a crucible, and there is some sentiment that it is sacrificing the very simplicity that gave XML legs.
The most powerful argument for simplification is that the penalty for over-simplification is much less severe than the penalty for getting it wrong. If not specified, users will implement their own inheritance models, after their own analysis and partner negotiation. This might not be such a bad thing, because it would permit experimentation and incremental design, and hopefully avoid the situation where a solution is canonized and subsequently ignored. The marketplace, often brutal and shortsighted, will have the last say here. What everyone does agree on is that a schema language free from individual vendor influence is the only non-negotiable requirement for the continued development of XML.