Listen Print Discuss
Something Useful This Way Comes

Something Useful This Way Comes

by Kendall Grant Clark
June 09, 2004

Already, Never, or Somewhere in Between

The Semantic Web is a complex human undertaking. Which means, at the very least, that we should expect it to require a significant investment of time and effort and funding. Both the European Union, and the US Pentagon System, as well as many member companies of the W3C, have invested heavily in technologies directly related to the Semantic Web, including XML, RDF, OWL, and various rule and agent systems and languages.

Further, since the Semantic Web is a decentralized, distributed, Web-scale knowledge representation system built on the Web -- a decentralized, distributed hypermedia system -- it's arguable that we should also tally all of the investment that went into making the Web itself, and thus the Internet before it, as well as into the constituent systems and technologies of the knowledge representation part of AI.

In other words, development of the Semantic Web requires a lot of work, but there's been a lot work done. This raises an obvious question: when will all that work pay off?

There are only three ways to answer that question -- already, never, or somewhere in between. In other words, one might say that the Semantic Web is already here; we already have a Web in which machines can retrieve, exchange, and manipulate knowledge in order to satisfy human needs and desires, in order to aid human projects and plans. There's an awful lot of RDF on the Web, in the form of RSS 1.0 feeds, FOAF files, photo annotation formats, geographic and calendaring ontologies, and so on. We might call this the weak form of the Semantic Web; yes, it's weaker than the more robust forms, but it has the advantage of actually existing, today.

There are, of course, more robust forms of the Semantic Web, ones in which OWL -- the Web Ontology Language -- and rules play a central role. This more robust form is not discontinuous with the weaker form; but it's different in that, say, we humans get more and more novel forms of aid and assistance from our machines. For example, rather than being able merely to publish my calendar in a form other people's machines can understand, in a more robust Semantic Web I will be able to publish my calendar, as well as some rules about me, my lifestyle, and my calendaring constraints, and software programs -- which are usually called "agents" in this context -- will take over much of the burden of maintaining my calendar, including making new appointments and events, from me. That will be possible, in large part, because many of the people and institutions with whom and with which I want to interact are using similar agents and technologies.

This more robust form, which we might call the Scientific American or Berners-Lee/Hendler/Lassila form, is -- as I said last August in an XML.com article ("The Semantic Web is Closer Than You Think") -- pretty much a done deal. Okay -- that's an exaggeration, but please note that it's an exaggeration in only one sense. By which I mean that the technology is a done deal. That is, we know how to implement such a system; and, in fact, at last year's WWW conference developers from University of Maryland's MIND Lab demonstrated a system that implements the Scientific American scenario.

The problem, as with any technology, is as much social as it is technical. The rest of the world -- and here I mean big corporations, legislatures, and all the other mechanisms that create new markets and warp old ones -- is playing catch-up to the technology. That's a good thing. I suspect we could see real world systems like the one described in the SciAm article in the next five to ten years. Such systems await a ubiquitous, broadband WiFi network infrastructure -- as well as, realistically, the right kind of legislation and policy changes -- as much as they await any specific web service, Semantic Web, or knowledge representation results.

So much for "already" and "sometime between already and never". What about "never"? Recent debates in the XML developer community suggest that there is contingent of developers and software professionals that believes that the Semantic Web is all hype. Their answer to the question, "when will all that investment in the Semantic Web finally pay off?", seems to be a resounding "never".

An Ongoing Debate

In what remains of this article I will review some of this debate, trying to figure out what, if anything is really at stake in it. And then I will say a few, brief, and informal words about a new W3C standardization effort that will, if it succeeds, help make the Semantic Web more likely to be realized in the real world.

As reported on XML.com two weeks ago by Paul Ford, the debate flared up recently when Elliotte Rusty Harold -- one of the XML figures I admire most, frankly -- suggested in his WWW2004 reportage that the Semantic Web was nothing but a big ol' hype balloon.

The reaction to Harold's claims is interesting in a few ways. First, it demonstrates that the W3C's influence is limited. Second, it suggests that there continues to be a lot of confusion about some of the advantages of, say, RDF over XML. Finally, it suggests that those of us who think the Semantic Web is a valuable project have failed miserably in communicating that to others.

Mike Champion offered an optimistic note, suggesting that Semantic Web technology may first flourish behind the enterprise firewall, in a way reminiscent of the earliest days of Netscape's corporate success:

The other previously missing ingredient is that real organizations have at least something approximating an implicit ontology in their database schema, standard operating procedures, official vocabularies, etc. It is at least arguable that the technologies that have emerged from the Semantic Web efforts allow all this diverse stuff to be pulled together in a useful way -- ontology editors, inferencing engines, semantic metadata repositories, etc. I'm seeing real success stories in my day job, and a coherent story is starting to be told by a number of vendors, analysts, etc.

Champion here makes a similar point to the one I argued in an article last fall ("Commercializing the Semantic Web"), namely, that there exist today several startups and fledgling ventures that are selling Semantic Web technologies to corporate clients, including Network Inference, Tucana Technologies, and others.

In response to Champion's post, Harold seemed to modify his mostly-negative appraisal of the Semantic Web. He conceded that

Part of what bothers me about the semantic web is syntax. It's too ugly to be practical. And syntax does matter. XML succeeded where SGML failed not because XML can do anything SGML can't (except maybe internationalization) but because the XML syntax story is cleaner and more approachable. The RDF syntax is just too ugly to be plausible.

The basic idea of RDF that seems useful is naming things with standard URIs. However, I simply don't see how the RDF syntax improves on XML+namespaces for that, and XML+namespaces is so much nicer a syntax than RDF.

I agree that syntax matters. A lot. But there is no "RDF syntax", unless he means by that the RDF data model. There is an admittedly rather ugly canonical serialization of RDF in XML; but there are also, at least by my latest count, five or six tractable alternatives to RDF-XML. (See my "The Courtship of Atom" for details and links to these alternatives.)

I disagree, however, that the "basic idea of RDF that seems useful is naming things with standard URIs" -- the basic idea of RDF is the formal data model, which offers the possibility of semantic interoperability which we simply do not have with XML. That the data model also offers a way to do inferencing is like icing on the cake. Some people like icing, some people like cake, and others, like me, like both, at the same time. In other words, lots of people do useful work with RDF and never use the inferencing it allows, while others are attracted to RDF precisely because of the inferencing. Where Semantic Web evangelists -- among which number I sometimes count myself -- have failed miserably is in turning that diversity into a widely perceived strength.

Joshua Allen took a similar line: "The value of RDF is the data model; not the serialization syntax". Allen, a Microsoft employee, also claimed that Microsoft project WinFS is similar to RDF: "OSAF Chandler is based on 'triples', as is Longhorn's WinFS. Both are essentially 'personal semantic web stores'. Triples+URIs is how you bootstrap the 'personal semantic web store' and make it universal". Honestly, I don't know whether to laugh, because with WinFS Microsoft seems to be buying into the Semantic Web idea, or cry, because with WinFS Microsoft seems to be embracing-and-extending the Semantic Web idea. Oh well -- outside of the realm of unenforced US antitrust legislation, Microsoft is like gravity. Eventually, you just learn to work around it.

This conversation is ongoing as I write this column, and it spans several threads and a few hundred messages. It also covers a very wide range of ground, a good deal of which has more to do with XML than the Semantic Web per se. If you care about this stuff, you might want to review the conversation in detail.

Query, Inference, and RDF

One of the themes running through the current debate is whether RDF is more expressive than XML and namespaces. I think that it is because of the formal RDF model and because of the inferencing that model provides. For every XML vocabulary I encounter, I have to figure out, on my own, what implicit data model is at work. And they may well be very different. I do this by reading a schema or other documentation. Sometimes I have to ask the people who are producing it. I might even have to guess. For example, consider the simple containment relation in XML:

<foo><bar/></foo>

What does it mean that there is "bar" contained in a "foo"? Is this "bar" a kind of "foo"? Does it mean that "bar" has a "foo"? Is "bar" subordinate to or dependent on "foo" in some way, or vice versa? I can find out answers to all these questions and more, of course, by consulting a schema or documentation or asking a developer. As its evangelists have said repeatedly, XML is useful because it gives us syntactic rather than semantic interoperability. Yes, that's true.

Because there is a formal model behind RDF, however, when given a piece of RDF, I just need to figure out what the predicates mean, and I need to figure out what the URIs identify. It's not as if RDF is perfectly self-describing; nobody of any competence claims that. But what I don't have to figure out is the relations between the different parts of the graph. I don't have to figure out which of the XML elements and attributes are the subjects, predicates, an objects. I get that for free. Those relations are described for me, formally and backed by real logical and mathematical formalisms, in the RDF data model. It's like the old science fiction dictum: you may not need RDF often, but when you do, you'll need it bad.

One of the things RDF and Semantic Web developers have been doing with data that complies with the RDF data model is querying it. Until very recently different communities and clusters of developers have created RDF query languages on their own, which has led to some inevitable problems, including a serious deficit of interoperability. The W3C's Semantic Web Activity has, accordingly, chartered a new Working Group, called Data Access, the members of which are working on standardizing a query language and data access protocol for RDF.

This Working Group has recently released the first working draft of its first document, RDF Data Access Use Cases and Requirements (UCR). I know this because I'm a member of this WG and the editor of this document. I point this out because, well, I want you to think I'm a really smart guy -- &wink; -- but also because I want to make it clear that in this column I am speaking for myself only and not for the Working Group.

The UCR document describes, as you might guess, use cases for an RDF query language and data access protocol. It also describes a set of mandatory requirements and optional design objectives, most of which are motivated by the use cases. In order to give you some idea as to what a standard RDF query language might look like, here are the requirements and design objectives which the WG has already accepted formally:

  • The query language must include the capability to restrict matches on a queried graph by providing a graph pattern, which consists of one or more RDF triple patterns, to be satisfied in a query.
  • It must be possible for queries to return zero or more bindings of variables. Each set of bindings is one way that the query can be satisfied by the queried graph.
  • The query language must make it possible -- whether through function calls, namespaces, or in some other way -- to calculate and test values extensibly.
  • The query language must be suitable for use in accessing local RDF data -- that is, from the same machine or same system process.
  • The query language must include support for a subset of XSD datatypes and operations on those datatypes.
  • The access protocol design shall address bandwidth utilization issues; that is, it shall allow for at least one result format that does not make excessive use of network bandwidth for a given collection of results.

There are some other candidate requirements or design objectives that the WG's members are presently debating:

Also in XML-Deviant

The More Things Change

Agile XML

Composition

Apple Watch

Life After Ajax?

  • It must be possible for query results to be returned as a subgraph of the original queried graph.
  • It must be possible to express a query that does not fail when some specified part of the query fails to match. Any such triples matched by this optional part, or variable bindings caused by this optional part, can be returned in the results, if requested.
  • It must be possible to specify an upper bound on the number of query results returned.
  • It must be possible to handle large result sets of any size by iterating over the result set and fetching it in chunks.
  • It should be possible for query results to include source or provenance information.
  • It should be possible to query for the non-existence of one or more triples or triple patterns.
  • It should be possible to specify two or more RDF graphs against which a query shall be executed; that is, the result of an aggregate query is the merge of the results of executing the query on each of two or more graphs.

The Data Access WG is eager to hear from the XML.com audience and invites feedback to its comments mailing list. Note: the latest unreleased version of the UCR draft is publicly available, so if you want to see the latest evolution of the WG's present work, including things the WG is considering but hasn't yet formalized, that's a good place to start.


Comment on this articleAre you a true SemWeb believer, or in the Harold camp?
(* You must be a
member of XML.com to use this feature.)
Comment on this Article


Titles Only Titles Only Newest First
  • a catch-22?
    2004-06-14 19:06:12 acroyear [Reply]

    I see it as one of those problems of selling something before it can be useful. if you can't sell it to data providers, you won't get the data (or "knowledge") you need to make it a viable thing to sell.


    Certainly the feds (DARPA, especially) were looking at their side of it for TIA and other information-gathering technologies, but they aren't likely to share their data, so it still becomes a hard sell to everybody else.


    Its also, aside from surveillance/intelligence applications garnering a lot of undesired attention, hard to say just what you could do with all of that "knowledge". lots of little examples of tips and tricks (many that can be duplicated with the right stored proceedure on oracle) hardly add up to the killer app needed to get vendors to buy into the idea and convert their data into knowledge on the SOW.


  • Toolsets needed
    2004-06-11 00:27:38 Julian Bond [Reply]

    If we had widely available easily used RDF parsers, complaints about the exact layout of the RDF-XML would be moot. You mention Jena. It may well be very good. But what if I'm working in a dotnet environment or I want to deploy on cheap webhosting where I can't use CPAN and all I've got is an out of date php?


    IMHO, widespread use of RDF is still held up by the lack of tools. Which is somewhat surprising given how long it's been around now.

    • Toolsets needed
      2004-06-11 07:15:53 bblfish [Reply]

      If you ask for your RDF in NTriples format


      http://www.w3.org/2001/sw/RDFCore/ntriples/


      then you could parse it with something as simple as awk, or with the basic c string manipulation functions. You get one triple per line and just need to cut in up into three sections
      <uri1> <uri2> <uri3>


      Could not be simpler really.

  • Value of Semantic Web
    2004-06-10 13:43:01 tpherndon [Reply]

    Allow me to frame my perspective: I am a system administrator who also programs applications for the hospital department for which I work. I work in an environment that is thoroughly mixed, with Mac, Windows and *nix machines all over the place.


    I have a fair bit of experience with relational databases. I've read Chris Date. I have some, though not extensive, experience with XML. I've used XML as a data interchange format, and of course have worked with XHTML.


    I understand the RDF data model, and appreciate greatly the fact that it *has* an underlying data model. But I don't yet see a killer app. I don't see what value RDF brings to the table. I suspect my own ignorance is the major fault, but still, I have yet to see a compelling use of RDF in the wild. Lack of marketing, maybe? I don't know.


    Okay, my bottom line is pragmatic: I use tools, and for me, in certain situations, XML has great value as a tool. That value comes primarily from the ubiquity of XML-oriented tools, and my ability to get data into and out of an XML format easily in any of the platforms in which I commonly work. Platforms, in that last sentence, should be taken in a broad sense -- I work in Java, Python and VB, on Windows and Mac clients, with Windows and Linux servers, using lots of different database engines.


    Having a good model for one's data is generally very important in my work. In my case, that model generally is a database schema and an associated object model. To my eyes, modeling something in an ERD is not much different from modeling something in RDF. Different in the details, certainly, but not in the fact that you are modeling the relationships and constraints of sets of data. (Or did I miss something really large in my cursory overview of RDF?)


    Having a model is important. Getting all the interested parties to sign off on that model is also important, and often is more work than the model. What comes after that is the pragmatic choice of tools. I can express my data model in the form of a relational schema, or I can choose to model in RDF and OWL.


    If I choose to use a relational database, I know that I can easily manipulate my data, searching, inserting, changing and deleting. There are many-many tools available to me that make my work easier, in terms of helping the end-user create, record, analyze and retrieve their data. Furthermore, general knowledge of databases is widespread in IT, and support is easy to come by. Connectivity options abound. These are all big value-adds for me choosing to use a relational database as part of my problem solution. These are parts of the wheel I don't have to reinvent. (When it comes time to program, the object-relational impedance factor causes me heartburn sometimes, but... :)


    Now, RDF and OWL provides a useful data model. Since most of my problem domains don't inherently involve document mark-up (and thus dictate an inherently XML-centric approach), what value do RDF and OWL bring to the table?


    From the looks of it, using RDF and OWL would mean reinventing a very large number of wheels. In my case, my typical use of XML involves adding XML translators to an existing application, in order to allow easier access to the services provided by the application, or to enable my application to use other apps' services. I can see that there may, somewhere down the road, be a need for something similar for the semantic world. TimBL's semantic web vision is compelling. But it isn't really here yet. And no, the weak form doesn't count.


    So, where's the value to my work? What do RDF and OWL give me that I can't get elsewhere, with less work? Why is this important to me, the scientific/corporate developer?

  • "data model"?
    2004-06-09 21:29:01 autarch [Reply]

    I wasn't aware that there was a formal RDF data model, as in:


    1. A collection of data structures. In the relational data model this is relations & tuples (also called tables & rows, bleah).


    2. A collection of operators which can be applied to those structures in order to retrieve data from those structures, in other words a query language.


    3. A collection of integrity constraints which can be used to define valid states for the data.


    The above definition is a paraphrased version of that given by Fabian Pascal in an article here: http://www.tdan.com/sms_issue28.htm


    What would these 3 things be for RDF?

    • "data model"?
      2004-06-10 05:27:34 Kendall Clark [Reply]

      I quote the RDF Concepts and Abstract Syntax standard:


      --
      3.1 Graph Data Model


      The underlying structure of any expression in RDF is a collection of triples, each consisting of a subject, a predicate and an object. A set of such triples is called an RDF graph (defined more formally in section 6). This can be illustrated by a node and directed-arc diagram, in which each triple is represented as a node-arc-node link (hence the term "graph").
      --


      However, I will say that I sorta ran together in my piece the RDF data model, which is a graph composed of triples, and the formal semantics of RDF, which is expressed in terms of model theory, which is a kind of logic of sets.


      But I think that running these together doesn't hurt my central claim, which is that, whatever one thinks of RDF's canoncial XML serialization, it has a formal semantics behind it, and that's one very big difference between it and XML plus namespaces.

    • "data model"?
      2004-06-09 23:59:54 giovannitummarello [Reply]

      assuming that's the right definitio of "data model" (according to me it isnt) you anyway get what you want (if that makes you happy)


      a) data structures are triples, the datamodel is a graph. let me remind you tree is simply a specific case so XML expressibity is just a subset of RDF expressivity


      b) Operators? Download JENA or any other tool kit and operate as you want, standardization (See query languages) is well on the way


      c) ever heard of OWL?


      There might be no intereting semantic web based applications NOW but that doesnt mean anything.. YOU try to make a simpler way to express relations between resources in a standard way.


      Final word of advice , people that even consider saying "but the syntax is ugly" are making clear that they're so far away from competent the should just plain remain silent.


      The rdf datamodel is relatively simplex, there are many syntaxes you can chose from, the stuff its the same anyway once you load it in a computational space that has operators for it (say Jena).


      Finally, i have the feeling that XML people saying rdf is complicated clearly havent read their own stuff.. ( http://www.w3.org/TR/xmlschema-1/ http://www.w3.org/TR/xmlschema-2/ )

      • "data model"?
        2004-06-14 07:28:42 bryan rasmussen [Reply]

        "a) data structures are triples, the datamodel is a graph. let me remind you tree is simply a specific case so XML expressibity is just a subset of RDF expressivity"


        I really love the whole "a graph is more expressive than a tree" thing, especially as regards rdf which is generally serialized as xml.


        So you see, you can express any tree structure in this graph structure which we've expressed in a tree structure.