Services and Links
January 13, 2003
I found a link in my weblog's referrers file last night that seemed emblematic of the current milieu. Here's the text of the link:
http://www.w3.org/2000/06/webdata/xslt?xslfile=http://ipwebdev.com/outliner_xsl.txt&xmlfile=http://weblog.infoworld.com/udell/rss.xml
Here's a clickable instance of the link, and here's a snapshot of its output:
Erik Benson's wonderful All Consuming book site continues to delight me. The newest feature, First Line Trivia, presents the first line of a book on each refresh of the home page. You try to guess the book, and click through to see the answer. Members, who can edit book metadata, add the first-line data, IMDb-style. Example: First Line Trivia This could get addictive! The first-line-trivia feature pushed me over the activation threshold, and I registered for the site. As a member, you can create a list of friends, which is seeded for you with candidates gleaned from Google's what's related and bl.ogs' related blogs. When friends add books to their All Consuming lists, you can receive them as Web (and optionally email) recommendations--and vice versa, your list can recommend books to them. In fact, I'm unlikely to maintain an explicit book list because the blog universe that All Consuming inhabits already disseminates book awareness very effectively. Bloggers mention books on their blogs; All Consuming picks up on those references; its RSS feed brings them to my attention. I'm surprised that there isn't more chatter about All Consuming on the weblogs I read. Increasingly, when I link to a book, I'm now likely to offer its All Consuming URL rather than its Amazon URL. Of course, as I just realized when reading this interview, Erik is an Amazon employee. Perfect! All Consuming is, in my view, one of the cleverest imaginable marketing schemes for Amazon--and for books in general. More and more books are available, at ever higher prices, but fewer and fewer people read. Boosting demand is the only hope for publishing, and Erik's service does that magnificently well. I'm more aware of books now than I have been in years. And since All Consuming's URIs are compatible with LibraryLookup, it's easier than ever to satisfy the increased demand. Update When you use All Consuming, you may be surprised by unintended consequences. The other day, I was puzzled to see it attribute to me a reference to a book I hadn't mentioned on this blog. I wrote to Erik about it, and he was stumped for a while too, then he realized that it must have been something my Google box found and made available to All Consuming. It just now happened again. I put the phrase "all consuming" into my Google box. A few minutes later, All Consuming attributed a reference to Affluenza: The All-Consuming Epidemic to my blog. I find these spontaneous interactions fascinating and delightful. I can foresee, though, that a time will come when we'll want to be able to control these effects--for example, by applying robots.txt-like technology at the level of page components.
The most compelling effect in Minority Report, for me, was the visualization of active paper. Last night we watched it again, and later some friends dropped by. To put this in context, I live in smalltown New Hampshire, not Silicon Valley or Silicon Alley. There are lots of dial-up Internet happening here, and DSL is growing, but Wi-Fi households are rare. When a topic came up in conversation, and I flipped open the TiBook to check it out, I had an epiphany. The future really is here, albeit not evenly distributed. I didn't mention, and I'm sure it didn't occur to my friends, that I was connecting wirelessly to the Internet. It seemed completely natural that "the Internet" would be "in" this little box, whether or not wires were running to it. The technology is disappearing into the woodwork, as it should. It is becoming a small-i internet. The emergence of Wi-Fi really has to be the story of the year. I'm currently reading The Wireless Networking Starter Kit, an excellent primer. The authors, Adam Engst and Glenn Fleishman, explaining how and why Wi-Fi is transformative, finally conclude: "It's just freaking cool." Amen to that!
The recent discussion about active intermediaries (Sam Ruby, Phil Windley) sent me in an unexpected direction. What I meant to do was revisit some earlier writing on Web proxies, email proxies, and SOAP routing, and try to draw some conclusions. Instead, I invented another bookmarklet. Here was the problem. It's nice that I can now look up a book in my local library, but what if it's not in the collection? My library's OPAC (online public access catalog) enables you to ask the library to acquire a book, but the required fill-in form creates an activation threshold that I am rarely motivated to leap over. The basic LibraryLookup bookmarklet is a kind of intermediary. It coordinates two classes of services--Amazon/BN/isbn.nu/AllConsuming and your local library's OPAC--to facilitate a lookup. I couldn't resist trying to create another intermediary that would facilitate a purchase request. The solution I'll present here is less general than the basic lookup in several ways, but also interesting in several ways. Here are the ways in which it is less general:
Nevertheless, here are the reasons I find the solution interesting.
Here is the bookmarklet you can drag to your link toolbar: Please Acquire Here is an Amazon page against which to test it: The Eighth Day of Creation: Makers of the Revolution in Biology. Clicking the bookmarklet's link should bring up a screen like the one shown here. It's OK to click the button. I've neutered the script so it will just pop up a message rather than send the request. To unneuter it, rewrite the form's action= attribute to specify your OPAC's acquisition-request URL. A few points to note in the code that follows:
All in all, an instructive little exercise. This sort of technique won't replace active intermediaries, including the local kind that work at the level of HTTP or SMTP. Rather, it will complement them. Users need to be able to see, and approve, what intermediaries propose to do on their behalf. I like the idea of an interactive intermediary that prepares a connection between two services, previews it for the user, and then makes the connection. Update The script below contains a privacy bomb which, after a few minutes of reflection, I removed from the live version invoked by the bootloader. It's a fascinating scenario, actually:
I find (2b) especially intriguing. It's not really in Amazon's interest for me to be aware of what's available in the local library, and it's not really in the library's interest for me to be made promptly aware of fines accumulating there. By yoking them together, I might be able to play the two services off against one another to my benefit--and to theirs. The bookmarklet's bootloader javascript:void((function() {var%20element=document.createElement('script'); element.setAttribute('src', 'http://weblog.infoworld.com/udell/gems/acquire.js'); document.body.appendChild(element)})()) The script loaded by the bootloader
var setCookieScript = 'function setCookie(Name1, Value1) { var
expires = new Date(); expires.setFullYear(expires.getFullYear()+1);
var cookie = Name1 + '=' + escape(Value1) +
';domain=amazon.com;path=/;expires=' +expires.toGMTString();
alert(cookie); document.cookie = cookie; }';
In an essay called Peer and non-peer review, Andrew Odlyzko pooh-poohs the fear that blogging (although he doesn't call it that) will undermine the classical system of scholarly peer review: With the development of more flexible communication systems, especially the Internet, we are moving towards a continuum of publication. I have argued, starting with [3]1, that this requires a continuum of peer review, which will provide feedback to scholars about articles and other materials as they move along the continuum, and not just in the single journal decision process stage. Obviously I agree. I'm not a scientist, but when asked in mid-2000 to produce a report on how Internet-based communication could improve scientific collaboration, I focused (in part) on weblogs and RSS as engines of distributed awareness and precise feedback. Back in September, Sébastien Paquet wrote me a thoughtful email, which I cited with permission, on the subject of blogging and research culture. His assessment bears repeating:
The sixth objection probably looms largest. The enterprise of science is at once exquisitely collaborative and fiercely competitive. One of the most poignant examples of the resulting dilemma is detailed in Horace Freeland Judson's The Eighth Day of Creation, the authoritative history of the elucidation of DNA's structure. Rosalind Franklin came very close to solving the riddle. But in the end, her X-ray crystallographic photos of DNA, conveyed indirectly to James Watson, triggered the crucial insight. She was denied the opportunity to collaborate directly, died of cancer a few years later, and is now a historical footnote. Obviously the world of science was less kind to women then than it is now. But Robert Axelrod's The Evolution of Cooperation suggests that Franklin probably would have been out of luck in any case. In his analysis, cooperation can arise and be sustained only when the Prisoner's Dilemma is iterated--that is, when there is reason to expect many future interactions, and when there is no clearly-defined endgame. The hunt for the structure of DNA wasn't like that. A once-in-a-lifetime career-making Nobel-prize-winning goal was in view, and that distorted the payoff matrix. In science (and in business) we might as well admit that, in such cases, competition will suppress cooperation. Rarely, we're pursuing a quest for a once-in-a-lifetime payoff. Usually, though, we're playing a game that looks more like an iterated prisoner's dilemma. A kind of meta-prisoner's-dilemma then arises. How can you tell the difference? 1Tragic loss or good riddance? The impending demise of traditional scholarly journals: There are obvious dangers in discontinuous change away from a system that has served the scholarly community well [Quinn]. However, I am convinced that future systems of communication will be much better than the traditional journals. Although the transition may be painful, there is the promise of a substantial increase in the effectiveness of scholarly work. Publications delays will disappear, and reliability of the literature will increase with opportunities to add comments to papers and attach references to later works that cite them.
I tinkered a bit more with the LibraryLookup project yesterday. First, I noticed that the Build your own bookmarklet feature was broken in Mozilla. It turns out that any undeclared variable in the JavaScript will break it. Some kind of security feature, perhaps? Anyway, fixed. While I was at it, I added a feature that previews the link that will be embedded in the bookmarklet, so you can test it first. It's the same principle as the ASP.NET test page. The bookmarklet generator also now emits a streamlined script. The original version, I'm embarrassed to say, went like so: var re=/[/-](d{9,9}[dX])|isbn=(d{9,9}[dX])/i; if ( re.test ( location.href ) == true ) { var isbn=RegExp.$1; if ( isbn.length == 0 ) { isbn = RegExp.$2 }; ... Of course, all that was really necessary was: var re= /([/-]|isbn=)(d{9,9}[dX])/i; if ( re.test ( location.href ) == true ) { var isbn = RegExp.$2 ... How did this happen? The usual way: when I expanded the original pattern to include the "isbn=" case, I didn't refactor. An instinctive programmer would have refactored on the fly. I'm not one, so I didn't see this until later. The problem with seeing it later is that you run smack into Don's Amazing Puzzle. It's far too easy to see a written text in terms of what we think it should say, rather than what it actually says. (Here, by the way, are two tips for Radio UserLand folks who want to include JavaScript in items and stories. First, remove all blank lines from your script, because the Radio formatter will turn these into <p> tags that will break the script. Second, backslash-escape all instances of //--which if it occurs nowhere else, will be found before the closing end-comment tag. Radio's not-very-discriminating URL auto-activator is triggered by an unescaped //--like this one: //.) Next, I took another look at the service lists. The first one came from Innovative's customer page, since withdrawn. The others I found by Googling for URL signatures. But I had been meaning to dig into the Libdex lists that a Palo Alto librarian, Martha Walters, referred me to. That turned out to be a fairly straightforward text-mining exercise which yielded, for Innovative and Voyager libraries in particular, greatly expanded lists with much more descriptive library names--and international coverage. Some of the many newly-added libraries:
Hong Kong -
Kowloon - City University of Hong Kong
Because the Libdex catalog uses an extremely regular HTML format, it was not hard to reinterpret the HTML as a directory of services. But it wasn't as easy as it could have been, either. On the Backweave blog, Jeff Chan wonders whether Mark Pilgrim's use of the CITE tag is really an improvement over raw text mining. And Jeff mentions my report on Sergey Brin's talk at the InfoWorld conference, where I quote him as saying: Look, putting angle brackets around things is not a technology, by itself. I'd rather make progress by having computers understand what humans write, than by forcing humans to write in ways computers can understand. This isn't an either/or proposition. Like Mark, I strongly recommend exploiting to the hilt every scrap of latent semantic potential that exists within HTML. Like Jeff, I strongly recommend sharpening your text-mining skills because semantic markup, in whatever form, will never capture the totality of what can be usefully repurposed. I guess I'm an extreme anti-extremist.
The 115 columns I wrote for BYTE.com are now restored to the public Web. I took this step reluctantly, and would have preferred that the original namespace remain intact, but so be it. Those columns that have continuing value can now weave themselves back into the fabric of the Web. This exercise was another chance to experiment with Creative Commons licensing, which had raised some questions. In the case of these columns, I chose the Attribution-NoDerivs-NonCommercial 1.0 option, following the logic expressed by Denise Howell (via Scripting News). Based on comments, I've also rethought my use of the CC license for LibraryLookup. My thinking on this was quite badly muddled, I'm afraid, mixing patent and copyright issues. As Matt Brubeck pointed out, a copyright has no bearing on patents, but publication alone is a hedge against potential frivolous use. In the end, I concluded that LibraryLookup was a poor test case for the application of CC licensing to software. So I switched to the more basic Attribution license. I spent quite a while staring at the screen before I decided what to write in the Description metadata field. Here is what I finally said: A performance, expressed in text, data, and code. Is it software? Phil Wainewright has a great essay on his Loosely Coupled weblog today: Software, Jim, but not as we know it.
Is it software? asks Dave. That's such a great question! From the moment I first saw an HTML form on a Web page, it was clear that boundaries were about to blur. Web pages are both documents and programs. Web sites are both publications and applications. URLs are both phrases and function calls. Text is code, code is data, data is text. The renewed understanding of documents and URLs in the SOAP community, over the past year, is an appreciation of this fundamental intertwingularity. Joshua Allen's terrific recent essay, Naked XML, translates into practical terms:
Now, take a look at Jonnosan's geographical service browser. Note, in particular, this feature: Check Availability (not very accurate yet!) Let's think about why not. Consider this query, which leads to a status page containing: <TD> On Shelf </td> It would be great, of course, if all 117,418 libraries in the U.S. were to offer comprensive XML APIs. I'm optimistic (or foolish) enough to think that I might even live to see the day. Meanwhile, though, suppose this status page were instead merely well-formed HTML or XHTML, with structural cues, like so: <td class="availability"> On Shelf </td> There's a nice little "pocket of structure within a sea of otherwise unstructured information." Multiply by 117,418. It adds up. Is it software? Yes.
Thanks to Andrew Mutch, the LibraryLookup project has added support for a fourth vendor of library software, Sirsi/DRA. The Google technique for service discovery turned up about fifty of these systems. But when Martha Walters showed me the master list of vendors, I remembered Will Cox's number--117,418 libraries in the U.S. alone. Googling remains a useful way to discover services, but it only finds a fraction of four supported systems, and there are many still unsupported. So here's a complementary approach: Build your own bookmarklet. The idea here is twofold. First, if your library uses one of the supported systems, but isn't listed, you can just generate the bookmarklet you'll need. Second, it provides a framework that can easily include more systems, as people discover and report the URL patterns that can drive them. |
As you'll discover by clicking the triangles, this is an active-outline version of my weblog's RSS feed, using Marc Barrot's activeRenderer technology.
What's going on with this link, and why is it so interesting? Let's decompose the link into its three constituent parts, each of which is a resource--or, we might say, a kind of Web service:
-
http://www.w3.org/2000/06/webdata/xslt
This is the W3C's XSLT transformation service. I believe it was Aaron Swartz who first drew my attention to it. You call it on the URL-line, and pass two arguments: a pointer to an XSLT script, and a pointer to an XML file. The output is the XSLT transformation of the XML file.
-
http://ipwebdev.com/outliner_xsl.txt
This is Marc Barrot's XSLT script for transforming an RSS file into an active outline. (Editor's note: This script is actually an adaptation of Marc's work done by Adam Wendt, cited originally at this URL: ipwebdev.com/radio/2002/06/07.php#a177.
-
http://weblog.infoworld.com/udell/rss.xml
This is my weblog's RSS file.
I've written elsewhere about how a URL can be used to coordinate resources in order to produce a novel resource. This notion of coordination seems intuitively clear, and yet after years of exploration I have yet to fully unravel it.
The View Source principle
Clearly this URL-composition idiom is rooted in the classic Unix pipeline. The composite URL says: pipe the referenced XML data through the referenced filter using the referenced transformation rules. The references, though, are global. Each is a URL in its own right, one that may be cited in an email message, blogged to a Web page, indexed by Google, and used to form other composite URLs. This arrangement has all sorts of interesting ramifications. Two seem especially noteworthy. First, there's what I call the View Source principle. I've long believed that the browser's View Source function had much to do with the meteoric rise of the early Web. When you saw an effect that you liked, you could immediately reveal the code, correlate it with the observed effect, and clone it.
This behavior, argues Susan Blackmore in The Meme Machine, is uniquely human:
Imitation is what makes us special. When you imitate someone else, something is passed on. This 'something' can then be passed on again, and again, and so take on a life of its own. We might call this thing an idea, an instruction, a behavior, a piece of information...but if we are going to study it we shall need to give it a name. Fortunately, there is a name. It is the 'meme'.
It's clear that memes, when packaged as URLs, can easily propagate. Less obvious, but equally vital, is the way in which such packaging encourages imitation. My own first use of this technique imitated Aaron Swartz, and operated in a similar domain: production of an RSS feed. Marc Barrot's use of it went the other way, consuming an RSS feed to produce an active HTML outline. But over at the NOAO National Optical Astronomy Observatories, it's been adapted to a very different purpose. Googling for the URL signature of the W3C's XSLT service, I found a link that transforms VOTable data produced by the NOAO's SIM (Simple Image Access) service.
From astronomers, the technique could propagate to physicists, and thence almost anywhere, creating new (and imitatable) services along the way. Now in fact, as Google reveals, it hasn't propagated very widely. You have to be somewhat geek-inclined to form a new URL in this style, and much more so to whip up your own XSLT script. Assuming, of course, that you have a source of XML data in your domain, and some reason to transform it. Historically neither condition held true for most people, but the weblog/RSS movement is poised to change that.
Consider the source of the URL that prompted this column: I found it in my weblog's referrers file. Had I not already known about these things, clicking the link would have shown me:
-
that the W3C's XLST transformation service exists,
-
that activeRenderer exists,
-
that the two can be yoked together to process my RSS feed's XML data into an active outline,
-
that http://ipwebdev.com/outliner_xsl.txt is an instructive XSLT script, available for reuse and imitation,
-
that the service which transforms my RSS feed into an active outline was deployed by merely posting a link,
-
and that I could consume the service--thereby offering an active outline to people visiting my blog--merely by posting another link.
Once this composite service and its constituents are discovered, they are easy to inspect and imitate. It's true that not many people can (or should!) become XSLT scripters. But lots of people can and do twiddle parameterized URL-driven services.
How do people discover these services? That leads to a second principle: the Web, in many ways, is already a good-enough directory of services.
The good-enough directory
Mine was among the heads that nodded sagely, in 1994, when Internet's lack of an authoritative directory was said to be its Achilles' heel. Boy were we wrong. I'm a huge fan of LDAP, and I think that UDDI may yet find its sweet spot, but a recent project to connect book web sites to local libraries reminded me that the Web already does a pretty good job of advertising its services.
My project, called LibraryLookup, sprang from the observation that ISBNs are an implicit link between various kinds of book-related services, and that a bookmarklet could make that link explicit. The immediate goal was to facilitate a lookup from Amazon, BN, isbn.nu, or All Consuming to my local library, whose OPAC (online public access catalog) supports URL-line-driven query by ISBN.
I then realized that this bookmarklet was a kind of service--packaged as a link, and parameterizable by library and by OPAC. Extending the service to patrons of thousands of libraries was merely a matter of tracking down service entry points. The vendor of my own library's OPAC offered a list of nearly 900 other OPAC-enabled libraries on its web site, and it was easy to transform that list into a set of LibraryLookup bookmarklets. Then a librarian pointed me to a more complete and better-categorized list, which a bit of Perl turned into over 1000 bookmarklets for libraries around the world.
Like the OPAC vendor, the maintainer of the Libdex catalog thought of it as a list for human consumption, not programmatic use. There was no special effort to tag the service entry points. But being good webmasters, they instinctively followed a consistent pattern that was easy to mine. We can hope that, when more people realize how this kind of list is a programmatically-accessible directory, webmasters will be more likely to make modest investments in what Mark Pilgrim calls million-dollar markup.
We're lazy creatures, though. The semantic Web requires more effort than most people are likely to invest. Is there a lazy strategy that will take us where we need to go? Perhaps so. As the LibraryLookup project began to add support for other OPAC vendors, I experimented with a strategy I call "Googling for services." The idea is that in a RESTful world, services exposed as URLs will be indexed and can be searched for. Using this strategy, I was able to round up a number of epixtech and Data Research Associates OPACs by searching Google for their URL signatures.
The experiment wasn't entirely successful, to be sure. These auto-discovered service lists are neither as complete nor as well-categorized as the lists maintained by Libdex. Of course, the links that Google found were never intended to advertise service entry points. Suppose more services were explicitly RESTful (as opposed to implicitly so, subject to reverse engineering of HTML forms). And suppose these RESTful services followed simple conventions, such as the use of descriptive HTML doctitles. And suppose that the Google API were tweaked to return an exhaustive list of entry points matching a URL signature.
None of these hypotheticals requires a huge stretch of the imagination. The more difficult adjustment is to our notion of what directories are, and can be. In a paper entitled Web Services as Human Services, Greg FitzPatrick takes an expansive view:
We will exercise considerable breadth as to what we call a directory. Obviously the current UDDI specification was not designed with this sort of thing in mind, but it is perhaps in keeping with the vision of Web services as reaching beyond the known and adjacent world to the unknown and possible world, where hardwiring and business-as-usual are replaced by exploration and discovery.
Later, describing the conclusions reached by the SKICal (Structured Knowledge Initiative - Calendar) consortium, he writes:
The SKICal consortium came to accept that it was not its task to build another portal or directory over the resources of its member's domains, but rather to make better use of the universal directory already in existence--the Internet.
Exactly. There is no silver-bullet solution here, and formal directories will play an increasing role as the future of Web services unfolds. But service advertisement techniques such as UDDI are not likely to pass the View Source test anytime soon, and will not be easy for most people to imitate. What people can do, pretty easily, is post links. Services that express themselves in terms of links will, therefore, create powerful affordances for use, for imitation, and for discovery.