Microformats and Web 2.0
October 19, 2005
Micah Dubinko's new column, XML Annoyances, begins this week with a look at the role of microformats, particularly with regard to Web 2.0 applications and services, as the core XML-specification era comes to a close.--Editor
We are all creatures of habit. We get set in our ways and comfortable with our working set of assumptions about the world around us. Yet sometimes those assumptions are misplaced, a sure-fire cause of annoyance. This goes for the XML community as much as any other group. For XML pros and occasional users alike, it's worth critically examining some of the "common sense" we've come to rely upon.
One of my habits--practically down to the muscle-memory level--is to check Wikipedia about any topic I encounter. As of the publication date, the first sentence under the XML topic reads:
The Extensible Markup Language (XML) is a W3C-recommended general-purpose markup language for creating special-purpose markup languages.
This has been the party line for the seven years since XML 1.0 first became a W3C recommendation. The follow-on specification of Namespaces in XML further reinforced the notion of preparing for a thousand applications of XML to bloom--enough so that a complex three-part naming system using difficult-to-remember (and troublesome) URI strings came into play.
Times change. As the final installment of the XML-Deviant column mentioned, "the classic era of core XML specifications is ending." In the early days, creating fresh XML-based languages was important and necessary; in fact it was the whole point of XML. Today, many still think that way. However, the point of diminishing returns for whole-cloth language invention lies behind us. In short, we've got enough markup languages for now, thank you. What's left is figuring out how to better use what we've got.
An informal movement called microformats embraces reusing existing XML vocabularies, most notably XHTML, in favor of developing either freshly-minted vocabularies or proprietary formats. A wealth of information is available on microformats.org, as well as here on xml.com, but to get a better feeling for what's happening in the microformats space, I tracked down a few leading practitioners: Tantek Çelik of Technorati and Casey West of Socialtext.
"Those Who Ignore Standards Are Doomed to Reinvent Them"
To start off, an example: Çelik, originator of the quotation above, points to Outline Processor Markup Language (OPML) as an area
where microformats could have provided a simpler, browser-friendly, lower-barrier
solution
to a common problem. It's true that many software packages, including nearly all feed
readers, have some level of support for OPML, but wouldn't it have been better for
an
outline format to have been browser friendly? What purpose is served by introducing
a bunch
of one-off elements, with names like opml
, head
,
body
, dateModified
, or (shudder) windowLeft
instead
of using existing, well-known XHTML elements?
That sort of question has been asked before. And the answer is XOXO, Extensible Open XHTML Outlines.
Microformats are human readable, but still machine readable. They're simple, targeted an normal people, not just XML experts. A more detailed list of what exactly microformats are and aren't remains basically unchanged since XML.com's earlier discussion. What has changed is the focus on an open research and development community, focused around the microformats.org wiki and website.
Microformats can be thought of in two general classes, "elemental" and "compound." Elemental microformats consist of a minimal solution to a single problem, often in as little as a single attribute. Common examples include rel="tag", used for folksonomy tagging, rel="nofollow", used to link to a page without endorsing it, and rel="license", for attaching a specific license to a web page. Additionally, larger formats like XOXO are still considered elemental.
In contrast, compound microformats are composed of elemental microformats. Examples include hReview, hCard, and hCalendar.
Closely related to microformats are "profiles," or subsets of a larger markup language. Examples include XHTML mobile profile (PDF link), SVG tiny, or UBL Small Business Subset. The primary difference between a profile and a microformat is that a profile is more directly created in the image of its larger ancestor. It's reduced in size, but still more-or-less serving the same purpose. Microformats are more focused.
Microformats Community and Process
In that earlier XML.com article, I dipped my toe into the waters of microformat design, proposing a format called the Exam Markup Language, or Examl. The first time most folks ever heard of it was reading the column. Turns out that was a mistake.
A microformats wiki page titled, So you wanna develop a new microformat? lists the steps one should follow. Note the emphasis on transparency throughout:
- Document current behavior
- Propose
- Iterate
Before starting work on a microformat, a fair amount of research and discussion needs to happen, generally in public. If only one person has ever worked in it, a microformat probably won't succeed. Research done, the next step is to propose the format in an appropriate forum, seeking copious feedback. Repeat as necessary.
The wiki asks two further guiding questions:
- If I looked at this microformat in a browser that didn't support CSS or had CSS turned off, would it still be human readable?
- Are this format's elements stylable with CSS?
To put it another way, those annoying non-semantic elements wither and fade under the scrutiny of the microformats community. This outlook enforces a proper view of markup as an intention-carrying component, not a presentational shortcut.
While I'm confessing past sins, I also wrote that "some gray areas remain. For example, is RSS a microformat? It seems to bear at least some of the characteristics of one..." It is true that RSS bears some characteristics, but analysis since that article has concluded that RSS is definitely not considered a microformat.
Most folks, though, won't need to create a microformat--they can use an existing one, secure in the knowledge of how much consideration has gone into its creation.
Influence on the Web
One possible objection is that microformats encourage "screen scraping"; instead of using a carefully crafted Web Services API, people (and their machines) will instead fetch regular pages and struggle on from there.
I asked Casey West about this. He noted that search engine crawlers will, except in very special cases, always prefer to enter a site through the same interface used by regular web browsers, because a single search engine would never be able (or even want) to keep up with all the possible third party APIs that might exist at any given time. In other words, microformats are a natural companion to REST-philosophy web services where useful data is only a GET request away. Microformats are human readable, but not at the expense of machine readability. Thus, it's not exactly fair to say one would have to "scrape" from a page with microformat data present--the data is structured and accessible by design. In other words, microformats tend to work better on the web.
A closely related question is, what kind of effect might microformats have on browsers and Web 2.0 applications that run in them? I liked West's answer, that basically they don't need to change beyond support for XHTML. Çelik added that a key principle involved is users controlling their own data. In several of his presentations, he asks the audience how many different email clients they've had over their lifetimes. How did the data migration go at each step? Not too well, usually, but intelligent use of microformats could perhaps improve the situation. This especially goes for calendar and address book applications, where existing microformat work is well-established.
Microformat Annoyances
Like any new technology, microformats don't solve every problem, and in fact introduce a few problems of their own. One is the general problem of microcontent, that is, useful units of data at a granularity less than that of individual documents. Many existing content management systems aren't equipped to deal with, say, a single XHTML document that contains 27 hCard instances. As microformats gain prominence, though, microcontent management systems should begin to catch up.
Presently, microformat progress is almost exclusively based on XHTML. Depending on your viewpoint, this may be a strength or a weakness. We'll get to possible alternatives in a bit. In some ways, the microformats movement and community competes with consortia-based standards development, which is slower to adjust to a new, less expansionary era. On the other hand, XHTML 2.0 shows all the signs of being an excellent microformat foundation--if and when it becomes supported by browsers.
As with any highly intentional language, working with microformats can sometimes be painful; the urge to insert presentational tags can be overpowering. For this reason, working with microformats requires eventually requires in-depth knowledge of XHTML, CSS, and other XML best practices. Any shortfall on these skills can make it hard to understand why certain things are done the way they are, and how to effectively make use of existing tools. Fortunately, the learning curve is not too steep, and the new skills can be added in an incremental, as-needed basis.
Lastly, standards developed as a microformat exist in a more constrained environment--new elements and attributes in general can't be created as needed. This can make versioning, already a hard problem, even worse. Existing microformats are young enough--and focused enough on solving a single small problem--that versioning hasn't become a serious problem. This will be an area to keep close tabs on in the future.
Things to Watch
The new generation of browsers finally supports more than just HTML. Will new microformats arise around SVG, XForms, or other existing markup languages? It's an open question.
Another question is how tightly microformats are (or need to be) bound to browsers. Many instances of full-scale XML vocabulary development fall outside browsers. In any of these cases, would it make sense to apply the microformat treatment to, say Docbook, OpenDocument, or UBL? Time and community interest will tell.
One more thing to keep an eye on: what is Mark Pilgrim up to? "Or do you just use your browser to browse? That's so 20th century."
The Bottom Line
Vocabulary proliferation is one of the biggest XML annoyances around. If you're like me, your brain can hold three, maybe four markup languages at a time. The microformats way of life prefers reusing existing work wherever possible. Recycled knowledge goes a long way. An active community works to continue progress on specifications, which tend to be easier reading than full-blown committee standards.
RSS is pretty successful today, but it took nearly nine years to get there. In a universe where, instead of RSS, an equivalent microformat started things off, would adoption have happened more quickly?
If you think the answer to that question might be "yes," then microformats are worth a look.