
Should Atom Use RDF?
by Mark PilgrimAugust 20, 2003
Four Independent Issues
The RDF model: statements are triples; use graphs not trees
The RDF/XML serialization: a popular syntax for expressing individual RDF documents
The Semantic Web
And here are four related but completely independent counterarguments:
The RDF conceptual model is overkill for specific applications, or is always overkill, or is simply the wrong model.
The RDF/XML serialization is wretchedly complex and breaks the "view-source" principle for RDF documents.
No RDF tools exist for my favorite language.
The Semantic Web is an unattainable pipe dream, or is too fluidly defined to ever come about, or something.
|
Related Reading
|
The problem with discussing RDF (where that means, "I think this data format should be RDF") is that you can support any four of these RDF issues (model, syntax, tools, vision), in any combination, while vigorously arguing against the others. People who believe that the RDF conceptual model is a good thing may think that the RDF/XML serialization is wretched, or that there are no good RDF tools for their favorite language, or that the Semantic Web is an unattainable pipe dream, or any combination of these things. People who are familiar with robust RDF tools (such as RDFLib for Python) -- and, thus, never have to look at the RDF/XML serialization because their tools hide it from them completely -- may nonetheless think that RDF/XML is wretched. People who defend the RDF/XML syntax may have nothing polite to say about the vision of the Semantic Web. And around and around it goes...
This is a problem with "I think this format should be RDF" discussions. Many people who are thought to be pro-RDF are, in fact, against it in one or more ways (the model is limiting, the syntax is wretched, the tools are buggy or nonexistent, the vision is stupid). And many people who are perceived as anti-RDF are in fact in favor of it in one or more ways (the model is good, the serialization is no more complex than straight XML, the tools work well enough, the Semantic Web is worth the wait).
For the record, I think that the RDF model is sound, the tools work for me, the serialization is wretched, and the Semantic Web is an unattainable pipe dream. If I appear to be wavering over time, sometimes pro-RDF, sometimes anti-RDF, it may be that I'm simply arguing different facets.
RDF and Atom
Why do I bring this up? Because, as it happens, the Atom project is creating a new format for syndicating content and an API for a new web service. For the past week and a half it has been completely engulfed in an all-out flame war over whether it should use RDF. The discussion has been almost entirely unproductive: this question is really four questions, corresponding to the four issues:
- Can Atom benefit from the RDF conceptual model?
- Should Atom feeds use the RDF/XML syntax directly?
- Can I use RDF tools to consume Atom feeds?
- Is Atom part of the Semantic Web?
My answers? Yes, no, it depends, and I don't care.
A Wise Teacher
I sat in on an IRC chat with Sam Ruby, Shelley Powers, Sean Palmer, Joe Gregorio, and others who have contributed heavily to Atom over the past few months. About half of these people are traditionally considered pro-RDF, half anti-RDF; but as you've seen, these simplistic labels are really just another a source of confusion, so I won't tell you which person is which. The focus of the chat was to come up with an RDF serialization of Atom by taking the examples from the Atom 0.2 snapshot (which are straight XML) and creating an XSLT transformation into RDF.
During the course of this chat, all of the four issues (model, syntax, tools, vision) came up. As you might imagine, some were more constructive than others. The model was really the most constructive, in that it taught us two key things:
Cardinality is vitally important to figure out up front, and the RDF model forces you to figure it out up front. This is a good thing. For example, an Atom
<feed>can contain one or more<entry>elements. If you had a feed with one element, it would look like this in XML:<feed version="0.2" xmlns="http://purl.org/atom/ns#"> <!-- some feed-level metadata omitted for brevity --> <entry> <title>Atom 0.2 snapshot</title> <link>http://diveintomark.org/2003/08/05/atom02</link> <id>tag:diveintomark.org,2003:3.2397</id> <issued>2003-08-05T08:29:29-04:00</issued> <modified>2003-08-05T18:30:02Z</modified> <summary>The Atom 0.2 snapshot is out. Here are some sample feeds.</summary> </entry> </feed>Now suppose you wanted to add a second entry. You just add a second
<entry>element:<feed version="0.2" xmlns="http://purl.org/atom/ns#"> <!-- ... --> <entry> <title>Atom 0.2 snapshot</title> <link>http://diveintomark.org/2003/08/05/atom02</link> <!-- ... --> </entry> <entry> <title>Atom API primer</title> <!-- ... --> </entry> </feed>In other words, straight XML doesn't force you to think about cardinality until it's too late. If you looked at the first example (with only one entry) and said "Aha! A feed has an entry in it!" and went off to write code based on that assumption, you'd be borked when your code hit the second example (with two entries).
But in RDF, collections of things are always explicit, so a feed with one entry would look like this:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:atom="http://purl.org/atom/ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/"> <atom:Feed rdf:about="tag:diveintomark.org,2003:3"> <!-- ... --> <atom:entries rdf:parseType="Collection"> <atom:Entry rdf:about="tag:diveintomark.org,2003:3.2397"> <dc:title>Atom 0.2 snapshot</dc:title> <atom:link rdf:resource="http://diveintomark.org/2003/08/05/atom02"/> <dcterms:issued>2003-08-05T08:29:29-04:00</dcterms:issued> <dcterms:modified>2003-08-05T18:30:02Z</dcterms:modified> <dcterms:created>2003-08-05T12:29:29Z</dcterms:created> <dc:description>The Atom 0.2 snapshot is out. Here are some sample feeds.</dc:description> </atom:Entry> </atom:entries> </atom:Feed> </rdf:RDF>See the difference? Entries are always wrapped in an
<entries rdf:parseType="Collection">container element. If there's one entry, you get a collection of one; if there are two entries, you get a collection of two. But you know up front that it's a collection.- The other big thing that the RDF model forced us to clarify was the
concept of ordering. In XML entries within a feed are in a particular
order. Is that order accidental or intentional? This, honestly, is not
something we'd given any thought to. The primary use-case for syndicated
feeds is that the client parses a number of feeds from different sources
and puts all the entries in chronological (or reverse chronological)
order. Each entry has a required
<modified>date for this purpose, so the issue of the structural order of entries within an individual feed wasn't a big concern.However, RDF forces it to be a concern because there are different container types for ordered and unordered lists. Once again the rigorous RDF model forced us to consider this up front, exposing an ambiguity in our current specification. The process of converting Atom-XML into Atom-RDF forced us to clarify these issues in our conceptual model.
So is the RDF model a good thing? I think that it is; considering it made our format better, regardless of the syntax.
But the Syntax...
However, as you can see from the above snippets and the full final Atom-RDF prototype, the RDF/XML syntax is far more complex than the equivalent just-XML version. (Depending on your browser, you may need to view the source of either or both of those examples.)
Part of the problem stems from the very thing that RDF is supposed to
be good at, namely, reusing and combining ontologies in a single document.
You see, we kind of cheated when we created Atom-XML. The specification
defines a number of elements (such as <title>) in terms
of Dublin Core, but when you look at
the actual Atom-XML document, you can see that we really redefined them in
the Atom namespace. As a result, the XML version looks simpler as first
glance because all the elements are in a single namespace which is defined
as the default namespace.
In theory, you could cheat in this same way in RDF and put everything in a single namespace. But then you've pretty much negated one of the main benefits of RDF because you've redefined parts of existing ontologies and made it harder for people to integrate your RDF documents with other RDF data. Now they'll need to transform or map all your redefined elements back to their original ontologies. Since we were creating an XSLT transformation and could make the RDF look like whatever we wanted, we all agreed that we should do the right thing and reuse existing ontologies as much as possible. (This was actually the bulk of the discussion time, bickering about which ontologies to use.)
This highlights the crux of the perennial flame wars about RDF/XML: it
can almost be as simple as pure XML. In fact with a few DTD tricks to
default the parseType attributes, it can look virtually
identical, but only if you cheat and redefine everything in your own
ontology and force everyone else to map it back to other ontologies later.
Or you can do the right thing and reuse existing ontologies from the
beginning and then the syntax gets hellishly complex. There's always an
additional cost; you can put it wherever you want, but you can't get rid
of it.
So should Atom use the RDF/XML syntax directly? I vote "NO".
The best of both worlds
RDF (the model) is a good thing; RDF (the syntax) is a bad thing. "But," I hear you cry, "I don't care about the syntax because I have good RDF tools!" How can we allow you to use your RDF tools on Atom, and do the right thing with reusing existing ontologies, and keep the syntax simple for people who simply want to parse Atom feeds in isolation, as XML?
We can make the XSLT transformation normative. Here it is, the result of a 4-hour IRC chat. We should include it in the specification, maintain it as the format changes, and mandate that it is the One True Way to use Atom syndicated feeds as RDF.
Is this more work for the RDF folk? Sure. Now they need an XSLT parser as well as their favorite RDF tool. But every platform that has robust RDF tools (a small but growing number) also has robust XSLT tools.
But Atom-as-RDF is not the primary mode of consuming Atom feeds. There are dozens, perhaps more than 100, tools that consume syndication feeds now. Some of them have already been updated to consume Atom feeds and the format hasn't even been finalized yet. Most will be updated once the format is stable. And, to my knowledge, only one (NewsMonster) handles them as RDF, and it already has the infrastructure to transform XML because it does this for six of the seven formats called "RSS" (the seventh is already RDF).
In other words, we're hedging our bets. Whether a vocal minority likes it or not, RDF is very much a minority camp right now. It has a lot to offer -- I saw that first-hand as it forced us to clarify our model -- but it hasn't hit the mainstream yet. On the other hand, it seems perpetually poised to spring into the mainstream. Tool support is obviously critical here (since they help hide the wretched syntax), and the tools are definitely maturing.
So should Atom be consumed as RDF? It depends. If you want to, and have the right tools, you can. You'll need to transform it into RDF first, but we'll provide a normative way to do that. If you don't want to, then you don't have to worry about it. Atom is XML.
What About the Semantic Web?
I don't care about the Semantic Web. Next question?
What do you think about using RDF in Atom? Share your opinion in our forum.
(* You must be a member of XML.com to use this feature.)
Comment on this Article
| Titles Only | Titles Only | Newest First |
- Recent developments
2006-01-10 03:44:15 Danny Ayers [Reply]
Worth repeating: Atom format is now RFC 4287!
Work on expressing Atom data in RDF (with OWL) continues, see:
http://atomowl.org/
- Schema languages as a wise teacher instead
2003-08-24 05:08:01 Martijn Faassen [Reply]
In the section 'A Wise Teacher' it is claimed
thinking in terms of the RDF model is useful as
it helps one think about cardinality and order. Doesn't an XML schema language help you
think about the same thing? If you're going to
write a schema for your XML format you will
have to capture cardinality and order aspects in
it. So if these examples are the main benefits
of considering the RDF model then I don't really
see the case for using RDF as opposed to schema
here.
Of course there's a case for reusing RDF tools.
There's also probably a case to think in terms
of the RDF model, which I don't know much
about. It's just that the supplied examples
didn't really work for me -- writing a schema
seems to be a more natural way to be made to
consider such issues.
The nicest part of the article to me is the analysis of the 'related but completely independent' (sic) issues. I'm "interesting" on the model, "ugh" on the serialization, "would like to play with it using some Python tool one day" on tools and "small directly useful steps are good
and we'll see where they'll lead us" on the semantic web.
Martijn
- Closer than you think
2003-08-23 01:45:13 Carl Garland [Reply]
I think as Avdi states you are actually one of the leading advocates of the Semantic Web and as the
other xml.com article states the Semantic Web is Closer than anyone realize from starting to appear. As I stated over at my lame blog I often think of the Semantic Web the same way as CSS. While initially buggy and underimplemented I think a large portion of tommorrow's WWW will be SW enabled largely through a few applications that will make their way into the mainstream and I think Atom will be the highway that many of the tools use.
- Semantic Web
2003-08-22 13:26:46 Avdi Grimm [Reply]
I find it amusing to read these protestations of "I don't care about the semantic web!" from people working on things like Atom. The fact is, if you are working on Atom you *do* care, because Atom is as much the semantic web as anything else. Atom is about exposing semantic information on the web for consumption by software tools. That is, by definition, the Semantic Web; claiming otherwise is just wishful thinking.
Ironically, with his work on making the web more accessable to those with disabilities, along with his recent work on Atom, Mark is currently one of the leading advocates of the real Semantic Web. It's a shame he refuses to acknowledge that fact.
- RDF "debate.".....Paradigm Lost
2003-08-21 15:38:35 Wayne Yuhasz [Reply]
Nice aritcle and good distinction made between different aspects of the technology , the model's character and larger view of it's contribution to the web's fundamental value (inclusive or restrictive)
- Incorrect Assumptions
2003-08-21 11:40:33 Shelley Powers [Reply]
Mark has made several incorrect assumptions. I've addressed these externally
http://weblog.burningbird.net/fires/001550.htm
- Halfway...
2003-08-21 09:00:35 Danny Ayers [Reply]
...to being convincing.
The use of a single namespace for Atom in RDF isn't really the big deal you suggest, so the syntax comparison you make isn't at all balanced. It simply isn't comparing like with like.
It's easy enough to express dc:title owl:isEquivalentTo atom:title or whatever in a schema, well out of the way of the feeds themselves. Tools smart enough to grok the RDF model shouldn't have much trouble grokking the equivalence. Simpler syntax, no extra work for the average developer.
The current web is a pipe dream. Saying that doesn't affect the constructive work anyone's doing.
- Me Too!
2003-08-21 01:35:30 Julian Bond [Reply]
- RDF is actually quite neat. I just wish I could understand what they're talking about half the time.
- RDF/XML serialization sucks.
- There's not enough mature tools and too many platforms are badly or buggily supported.
- Semantic what?
- Namespace Cheating...
2003-08-21 00:09:13 Nasseam Elkarra [Reply]
Why not use dcterms:modified instead of modified? It makes no sense to put an element in the Atom namespace if it is already defined in another people are familiar with.
It is funny how people want to make the syntax as simple as possible when at the end there will be tools to abstract it all away. I don't care about a simple syntax, I want what is right. If that means making Atom a little more complex and based on RDF and Dublin Core, so be it.
XML is great. One of the biggest problems people find with XML is that when it was being passed around, people are inconsistently naming and structuring data. So specifications and recommendations have been released in hopes that people would start using consistent terms. RDF and Dublin Core come along and do just that and then people complain that it is complex. Of course it is going to be complex! We are talking about trying to build consistent vocabularies and terms here!
I recently switched over one of my projects to RDF/DC and I am really pleased with the consistency it gives the metadata. I am still working out the quirks to take full advantage of RDF but once I run my documents through the RDF validator, the advantages become clear: consistent terms and lots of tools.
My final piece of advice is: don't cheat. Some XML parsers cheat and actually allow non-XML behavior to sneak in. We cannot risk cheating in implementing XML as well. Namespaces help us define our terms so let us take advantage of this feature. Let us not be scared of complexity. Chances are when a developer talks about complexity they mean the secretary-at-my work-won't-understand kind of complexity which in fact isn't complex for real developers. If it is going to take complexity to do something right then don't worry, your average Joe Blogger won't be writing his Atom in Notepad, he will be using a blogging tool.
- On Tools...
2003-08-20 22:15:54 Micah Dubinko [Reply]
At the point where you can only effectively work with a data format using tools, you have already lost.
And for the record, like Mark I think that the RDF model is sound, the serialization is wretched, and the Semantic Web is an unattainable pipe dream. -m
