Menu

Berners-Lee Keeps WWW2004 Focused on Semantic Web

May 20, 2004

Paul Ford

New York City, May 19 -- A peculiar buzz is back in the halls of WWW2004 -- the mix of hubris and geek name dropping, cheap suits and over-eager handshakes that last prevailed in 2000. "I nearly invented the web," says a fellow with a large stack of promotional postcards advertising new social networking software. "People are downloading our new XML API almost before we upload it," said another, making introductions to anyone who wanders within distance.

Whether the days of the dot-com boom are back, or people simply wish they were, there is an optimistic tone to the WWW2004 conference, the 13th International World Wide Web Conference, held at the Sheraton New York. The core technologies of the Web, like XML, XHTML, CSS, and Web Services, have all been accepted by the IT world in general, and issues like accessibility, far from being fringe concerns, are now understood by all. "We've come from a long day from the days of telegraphs and typewriters", said Gino P. Menchini, commissioner of the Department of Information Technology (the Mayor himself was at the 9/11 hearings, and couldn't make it). Menchini went on to name May 19 "World Wide Web day in New York City", to much applause from the several hundred assembled around tables in the Sheraton's Imperial Ballroom.

But now that the Web is unquestioned as a basic medium, part of a parcel with television, publishing, and radio, there is risk of stagnation. To that end, Tim Berners-Lee, creator of the first Web browser and server, and inventor of HTML, gave an open-ended plenary talk focused on two open questions: What should we do with top level domain names (TLDs)? And what should we do with the Semantic Web? It was the latter question that is clearly most important to Berners-Lee, and over the course of the speech, as he encouraged developers to begin using the Semantic Web, it became clear that Berners-Lee is less than satisfied with the current state of the web -- and not entirely clear as to the best way to proceed.

Dealing with Domains

Berners-Lee pointed out that the new TLDs (like .biz, .info, and .xxx) attempt to sort domain names into a semantic tree, hopefully making it simple to identify the essence of a site (is it business? is it information? is it porn?) by its name. But he questioned the concept at it root. "When people are looking for global brands, it's a flat space. You're much more likely to look for johnstravel.com than johns.travel," he said.

According to Berners-Lee, the entire concept of TLDs as a means to expand the domain space is suspect. What, he asked, does the .xxx TLD mean? For Americans, this brings back the debates over pornography in the 80s, when judges were trying to find a balance between smut and free speech, and found it exceptionally difficult to find where art ended and porn began. "I have a high tolerance for people with no clothes on, and a low tolerance for violence," said Berners-Lee. But that level of tolerance "might be different for someone from the Christian right."

So, if this semantic ambiguity is an unavoidable part of the TLD system, and thus makes the entire TLD enterprise somewhat suspect, what are TLDs good for? Berners-Lee suggested that we use TLDs to indentify content type -- we could start, for instance, by using .mobi for mobile applications, and enforce content type standards within the mobi domain. As a result, a web of reliably mobile device-accessible content would emerge. By promoting a "device-centric" use for TLDs, he implied, we make TLDs less useful and fragment the web into multiple parts.

Challenges to the Semantic Web Community

Berners-Lee quickly moved from the discussion of TLDs to the Semantic Web. With OWL and RDF as official W3C recommendations, he told the crowd, the foundation of the Semantic Web is in place, and it is time to move on to Phase II: "a time of less constraint." He acknowledged that given multiple ontologies and different kinds of data, "[the Semantic Web is] bound to be inconsistent. Well, so is the web."

"People ask," he said, "so what's the Semantic Web killer app going to be? That's not the right question." The real proof of the Semantic Web, he said, is when new connections are made, and new links between information emerge.

Rather than concerning themselves unduly with hewing to existing ontologies, Berners-Lee pushed developers to start using RDF and triples more aggressively. In particular, he wants to see existing databases exported as RDF, with ontologies created ad-hoc to match the structure of that data. Rather than using PHP scripts only to produce HTML, he suggested, create RDF as well. Then, when all of the RDF is aggregated, apply rules and see what happens. "Let's not fall back on handmade markup." Later in the talk, he described a cascade of Semantic Web connections, postulating that one day, individuals may be able to follow links from a parts catalog to order status, from location to weather to taxes.

Berners-Lee acknowledged that the Semantic Web framework is in opposition to the conventional wisdom regarding who controls the display of information. "The person publishing the data will feel that they have the right to tell it how to look," he said, and content producers who fund their work through advertising will be resistant to hand over their content in a neutral RDF form that can be displayed and linked in unpredictable ways. "But that's just one side of it. It's the [user] who's really in control." He challenged the audience to create a Semantic Web browser that would address these issues, an "open application, pulling in style information from lots of different places."

He then moved away from the idea of the Semantic Web as connecting individuals to information and promoted it as an automated disambiguation layer within an operating system. In particular, he described a potential "RDF clipboard" that would automatically translate content types between applications. For instance, it would "copy a piece of SVG, which is vector graphics, and paste it into something that can only handle a bitmap graphic." The RDF clipboard, as a set of rules, would automatically know how to translate between data types, converting on the fly using rules.

Close to the end of the talk, Berners-Lee discussed the Semantic Web bus, the model whereby data, ontologies, rules, and logic interoperate. He then showed the audience a real Semantic Web bus (exterior, interior), created by the W3C's Spain office. The bus will promote the W3C's agenda throughout Spain by operating as a classroom on wheels.

The Semantic Web's Uncertain Destiny

At the conclusion of Berners-Lee's speech, and by reading through the papers in the conference proceedings, it is clear that the Semantic Web has not yet entered adulthood; it is rather in a somewhat uncomfortable adolescence. While there is no shortage of suggested commercial applications, and more prototype frameworks than one can count (from HP Labs' Jena to the Haystack framework) very few have made their way to end users. The users of the Semantic Web are currently those who deeply care about the Semantic Web as a concept; the concept of total connectivity of data has yet to catch on.

Berners-Lee acknowledged this by issuing challenges to the WWW community, seeking to seed the Semantic Web in the minds of developers. But at the same time, his challenges themselves show the ambiguity inherent in the Semantic Web project. Is the SemWeb a layer above (or below) the Web, as in Berners-Lee's proposed SemWeb browser, linking people, ideas, and resources together? Or is it something fundamental to the computing experience, like the RDF clipboard?

Obviously it can be both -- there are no hard-set technological limits on the proper domain for using triples and logical rules. But in the short term, while no one can fully agree on what the Semantic Web is, the need for a clearly articulated vision is of essence in order to move the Semantic Web further. It is surprising to see that its leading evangelist is also uncertain as to whether Semantic Web's immediate destiny is on the server or the desktop.