
Putting ISBNs to Work
by Kendall Grant ClarkJune 02, 2004
In addition to all the spiffy diagrams in that column, I said that the bulk of the work required to implement LCC@Home was the process of labeling the items -- I suspect books, mostly -- in your collection. One trick for streamlining that process is to use the LC Cataloging in Publication data which is very often printed on the verso of a book's title page.
Most books published by American university presses since around 1970, and most other books published since 1990, will have a CIP block; but many small press and older books won't. That means you have to look that book up in some kind of Library of Congress database in order to find its LC Call Number. When I did this project the first time four years ago, I used the LC database on the Web. But it's a bit annoying to use other than casually, since it employs some rather picky session timeout settings.
What I really wanted back then, but didn't really take the time to figure out, was a command line tool that let me input an ISBN -- nearly every book you're likely to own either has an ISBN or can be assigned one easily enough -- and which outputs a Library of Congress Call Number, which I could then affix to a book.
In this and next month's column, I'm going to design and implement just such a tool in Python, isbn2lccn. More specifically, in this column I'll look at ISBNs, including how we might use ISBNs in RDF, and consider some of the sources of bibliographic information available on the Web. Next month I'll walk through the Python code and talk about how we can turn it into a proper web service itself.
International Standard Book Numbers
We're going to input ISBNs and output LC Call Numbers, but what is an ISBN anyway? First, it's an international standard, ISO 2108. Second, it's a structured identification string, made up of 10 digits, that is "unique" and "machine-readable" and "which marks any book unmistakably", according to the International ISBN Agency. ISBNs have some properties that geeks like us find pretty interesting. The 10 digits of an ISBN represent four fields: a group, publisher, and title identifier, plus a check digit. In 2007 U.S. publishers will begin to transition to 13-digit ISBNs. An ISBN is a book's fingerprint, often represented by a bar-code. It's another bit of the world I call a dijalog inflection point -- that is, ISBNs, whether represented as digits, bar-codes, or in RFID tags, are points at which the digital and analog worlds synch up.
Getting Ready for RDF
Why is that point of inflection useful? Physical (or, as I persist in misnaming, analog) items that have a unique, machine-readable identifier can often become the subject of machine-readable assertions, using RDF. There are at least two possibilities: first, use the unique identifier as the basis for coining unique URIs; second, check to see if there is a URN scheme you can use instead.
Eventually in this series I will begin to explore and to use RDF to represent assertions about the items of my dijalog collections. For example, imagine that we want to make some machine-readable assertions about The Iliad. (I'm going to skirt for now some tricky data modeling issues here; but rest assured that we'll come back to them later on in this series.)
The first thing we might do is to coin a URI using a book's ISBN:
http://monkeyfist.com/kendall/dijalog/0670835102
That's a perfectly good URI: as the person who owns the domain "monkeyfist.com", I am the controlling authority (in some ways Semantic Web folks still haven't really worked out) of that URI; it's not going to clash with coining other URIs; it suggests a generative naming scheme, so I can easily make new ones at will. One bad thing about this URI is that it's not likely that anyone else will want to use it, which makes it somewhat harder for different parties to know that we're making assertions about the same thing -- which is part of the mojo of the Semantic Web and RDF in the first place.
What about the second possibility? As discussed in RFC 3187 we could use a URN schema for ISBNs, in which case our URI becomes a URN:
urn:ISBN:0670835102
This URN is preferable to the URI for at least two reasons: first, I have some reasonable expectations that other people will use this identifier form; second, it's semi-structured: if I have one of these URNs, I and others know exactly how to convert it to the equivalent ISBN:
Python 2.3 (#1, Aug 5 2003, 15:19:06)
[GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2
>>> print "urn:ISBN:0670835102".split(":")[2]
0670835102
>>>
But do I really want to make assertions about that particular realization of the The Iliad, the 1990 Robert Fagles hardcover edition? Maybe I do and maybe I don't. Since I own that book, I will eventually want to make assertions about it, including the assertion that I own a copy. But what if I want to say, first, that it is a realization of the abstract entity "an English translation of Homer's Iliad"? In that case, I'd probably use an RDF blank node, which is an RDF graph-scoped variable. For example, using the Notation 3 form of RDF, I might say:
@prefix bib: <http://monkeyfist.com/kendall/dijalog/scheme/0.1/#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
[]
dc:title "The Iliad" ;
dc:author "Homer" ;
rdfs:label "Homer's Iliad" ;
rdf:type <bib:work-of-antiquity> ;
bib:realized-by <urn:ISBN:0670835102> .
<urn:ISBN:0670835102> ;
bib:translator "Robert Fagles" ;
rdf:type <bib:book> ;
dc:date "1990"^^xsd:gYear .
That is, I might want to say that there is a thing, such that it has the title "The Iliad", was written by Homer, and is a work of antiquity. Further, I might want to say that this rather abstract entity is realized by a particular book which was translated by Robert Fagles and published in 1990. If I owned several different versions of this abstract entity, I could make assertions about all of them, too. Eventually, I might use some of the properties of OWL, the W3C's Web Ontology language, to say that some of these things were the same.
That's where we're going, eventually, but first we need to talk about converting ISBNs into LC Call Numbers.
ISBN and Book Variants
Before taking up some sources of bibliographic information on the Web, let me commit an act of library and information science heresy -- when implementing LCC@Home, it's safe to consider most, if not all, of your books to be interchangeable instances of a class, rather than as classes or types themselves. That's a very bad way of saying that, for real librarians, different versions of the same book are conceptually and organizationally distinct; but for pseudo-librarians like us, we can consider different variations of the same book to be the same or similar. Consider, for example, that old copy of the Iliad you have from college days, the reissued classic 1713 translation by Alexander Pope, who rendered Homer's Greek dactylic hexameter into the poetry of my mother tongue. Consider, too, a copy of the first English prose translation of the Iliad, Samuel Butler's 1898 effort. (Both of which, by the way, are available from Project Gutenberg.) Finally, consider Robert Fagles's very idiomatic verse translation of the Iliad from 1990, the one I made some RDF assertions about earlier.
In some informal sense which may be useful, these are the same book. While they are very distinct in many ways, including in ways which matter to library and information science, to scholarship, as well as to my eventual efforts to use RDF in this domain, it's likely that many versions of the first two books don't have ISBNs, while the third one certainly does. I'm suggesting that, if you run into books in your library that don't have ISBNs -- probably because they were published before the advent of ISBN in the early 1970s -- you can use the ISBN of a newer edition or translation or version of the book. That is, feed the ISBN of the Fagles's version of The Iliad into our tool in order to get an LC Call Number to affix to your copy of the Pope translation. If you have a choice, it's better to use the ISBN for a recent version of the same book than to use a newer edition of a different translation of the same book. This makes librarians shudder, of course, but it's still preferable to trying to do your own original cataloging.
Bibliography Web Services
In what remains of this column I'm going to describe four sources of bibliographic information on the Web, ending with the sources I'll be using to build my isbn@lccn tool. I'll conclude with a simple description of the tool, the full details of which I'll take up next month.
Amazon.com
Everyone knows that Amazon.com sells books. And probably everyone in the XML.com audience knows by now that Amazon provides an awful lot of information about those books by way of SOAP or REST web services. I won't say much else about Amazon's informational offerings today, except that they don't contain LC Call Numbers. But that's fine because they do provide information via ISBN, and eventually we may want to incorporate some of the information Amazon does provide -- sales price, book cover graphic, related books, etc. -- into the tool we're building.
xISBN
Next, the OCLC's xISBN is a web service that takes as input an ISBN -- that's good, our tool takes ISBNs as input -- and returns more ISBNs. What does that mean? First, let's figure out how to call xISBN, which is a plain old REST service. Simply dereference (either programmatically or from your web browser) URIs of the form:
http://labs.oclc.org/xisbn/[ISBN]
where the path segment "[ISBN]" is replaced by, you guessed it, a valid ISBN. For example,
http://labs.oclc.org/xisbn/0670835102
is the xISBN URI for the Fagles's Iliad.
What do you get back when you dereference that URI? You get, in REST speak, an XML representation of a resource that can loosely be described as "other ISBNs associated with the one you submitted, according to OCLC's WorldCat records". The XML format is the very soul of simplicity:
<?xml version="1.0" encoding="UTF-8" ?>
<idlist>
<isbn>0670835102</isbn>
<isbn>038505940x</isbn>
<isbn>0872203522</isbn>
<isbn>0872203530</isbn>
<isbn>0226469409</isbn>
<isbn>0674995791</isbn>
...
The ISBNs returned by xISBN will most typically be different versions of the same work as the one identified by the input ISBN. That's not the information we want directly, but it could be useful in our tool. Imagine you have a book with an ISBN, but that particular ISBN hasn't been given an LC Call Number, for whatever reason. In that case, it may make sense to ask xISBN for related ISBNs, for which we could then try to find LC Call Numbers.
I haven't decided whether to incorporate that bit of functionality into our isbn2lccn tool, but I'm going to consider it, especially if I hear from some librarians and information scientists about this issue. (In other words, if you care about this, either way,
Finally, if you want to read xISBN's technical details, they're available.
Z39.50, MARC, and MARCXML
The traditional computerized means of accessing and interchanging library bibliographic records is the Z39.50 protocol (an ANSI/NISO standard) and MARC records. I should point out that the very first programming project I ever undertook was to write a MARC parser in Python -- a project that failed miserably.
I want to say a few things here about Z39.50 and MARC, but they really deserve a column of their own, which I intend to provide later this year. MARC is a bit ghastly, given contemporary standards for data interchange, though there is an XML representation, MARCXML. It's pretty ghastly, too. In short, I want to avoid mucking about too much with MARC records for now.
Of interest, however, is the fact that the LC Call Number is often present in a MARC record in data field 050. So I could write an ad-hoc tool to deal with 050 in MARC records. But it's probably smarter and easier to use a real MARC parser; there's one available in the PyZ3950 open source project. It's even easier, I suspect, to write a bit of code to extract the LC Call Number from XML versions of MARC records. If I were going to work directly with MARC, I'd probably go the XML route, though there may be advantages to using a full MARC parser that I don't yet understand. More about that in a future column.
For now let me say that nearly all bibliographic information still zips around the world via Z39.50 and MARC -- the ultimate source of the information provided by our tool is Z39.50/MARC.
Z39.50 Gateways and XML
What I really want to use is something I already understand pretty well, namely, XML and a more popular, thus well-known, vocabulary. I've decided that for our tool I'm going to use the Library of Congress's Z39.50 web gateway and its XML message formats, which are based on the XML metadata vocabularies Dublin Core and MODS. Since Dublin Core is pretty well known by XML developers, I'll probably play with the MODS messages as a way to evangelize that metadata standard a bit. It's an interesting vocabulary. If I ever add persistence to our tool, we'll have a configuration bit that users can twiddle if they prefer Dublin Core, MODS, or MARCXML.
There's also an experimental British Library gateway that does much the same thing, so our tool will likely query both services. More about it next month.
Our isbn2lccn tool will read its command line arguments, one of which will be a required ISBN. It will then make some REST web service calls to either the Library of Congress or the British Library; in some cases it may also make calls to Amazon or to xISBN. In its regular mode it will extract the LC Call Number from the XML message returned by the BL or LC services, printing them to the output channel specified by one of the invocation arguments -- probably STDOUT by default. In a future column I'll likely add a batch mode in which the tool will store some representation of these messages, probably as raw XML, on the disk. And if I can get the Python PDF libraries to play nice, it will eventually create a PDF suitable for printing directly onto sticky labels.
Finally, I must acknowledge two correspondents who really did all the hard work behind this month's column. First, Bill Oldroyd emailed to tell me about the British Library's experimental service, which provides an XML representation of bibliographic records, indexable by ISBN. These messages contain a stylesheet PI and so are humanly-readable with the right browser. That's a very neat trick. Second, I want to thank the proprietor -- whose name, alas, I neither know nor could locate -- of the RAWBRICK.NET weblog. It was a weblog posting there that provided me most of the details of the LC gateway.
Got a question or comment on this article? Share it in our forum.
(* You must be a member of XML.com to use this feature.)
Comment on this Article
| Titles Only | Titles Only | Newest First |
- Los Angeles Locksmith Rekey Locksout Los Angeles Locksmith 818-386-1022
2008-07-13 11:45:45 Locksmith Los Angeles [Reply]
Los Angeles Locksmith Rekey Locksout Los Angeles Locksmith 818-386-1022 install locks http://locksmithsecurity.blogspot.com
- isbn2lccn code from Kendell?
2005-12-27 02:08:42 Dave Pawson [Reply]
The article was published in '04, promises code.
Did it not happen?
Pity, it would have been nice to see the conclusion.
- Where's the rest?
2005-07-25 20:32:59 BHD [Reply]
I was just remembering this series, having looked forward to more of it (particularly what you were going to do with RDF). What happened?
- MODS
2004-06-13 09:55:08 BHD [Reply]
Nice to see you mention MODS. I've been trying to promote its use within the free software personal bibliographic realm for the past year, as well as gently push MODS itself in a direction to better support these needs.
I'm more interested in the z39.50 and SRW web services stuff, maybe hooked up to an XML DB (the new Berkeley DB XML has a Python inteface, which is what the Syncato weblog system is based on). There are a couple of people from the z39.50 community on the Open Office bib project dev list, and they've made an interesting argument for a unified local/remote query interface based on ZOOM and CQL.
The OOoBib project, BTW, itself is planning to use MODS as its basic metadata standard.
http://bibliographic.openoffice.org
- LC Lookup for "equivalent" ISBNs
2004-06-11 12:29:01 sbonds [Reply]
---- Kendall Wrote:
I haven't decided whether to incorporate that bit of functionality into our isbn2lccn tool, but I'm going to consider it, especially if I hear from some librarians and information scientists about this issue.
-----
Well, I can't claim to be either of those two, but as someone with too many books to continue with my current lack of organization, I can say that I'd love to see this feature in place. For example, most of the Terry Pratchett paperbacks in my collection have no LC call number associated with their ISBN-- however the hardbound versions of the same books do.
For example, one of my books is ISBN 0552124753, which doesn't have a call number. However, the labs.oclc.org site reveals the "equivalent" ISBN 0312150849. This one has the LC call number PR6066.R34 C6 1983. Done!
I have loads of books that will have this problem so it would be helpful to have the feature built in.
However, if it's not built in, so long as there is a nice command line interface, I can always parse the ISBN list and iterate over it myself. ;-)
-- Steve
- Perl-based Lookup
2004-06-14 01:02:43 sbonds [Reply]
For a simple perl-based demo tool showing an ISBN-to-LCCN lookup, try a script I just put together and uploaded to CPAN:
http://www.cpan.org/authors/id/S/SB/SBONDS/
(look for the most recent "isbn2lccn-XX" script)
The hardest part about getting this running is installing the vast array of perl modules and dependencies.
-- Steve
- Perl-based Lookup
- Printing
2004-06-11 02:45:19 Martynas [Reply]
Why use PDF for printing? Wouldn't SVG be great for that? You probably could even create bar-codes. And all it takes is a XSLT stylesheet.
- more specifics on the lc gateway
2004-06-04 05:59:56 rawbrick [Reply]
Thanks for the link, Kendall - and glad I could help! For the curious, I obtained the info[1] on the LC Gateway from a library listserv that unfortunately doesn't seem to have web archives, otherwise I could point you closer to the source. I've done some playing of my own with it, to generate a call number list of the books I've read.[2] Not much of a programmer so some of what you've written is a bit lost on me, but MT plugins can be powerful stuff for people like me. :)
[1] http://www.rawbrick.net/2004/03/30/13.16/
[2] http://www.rawbrick.net/books/class/
