Stuck in the Senate
October 13, 2004
Last month we created an RDF representation of the United States Senate, and this month I was going to do the same for the House of Representatives. But after looking closely at my Senate RDF, and thinking about the sort of queries I wanted to make of it, I realized that it's a mess. So in this column, we're going to (hopefully) fix it.
Let's take a look at a sample Senator again:
<USSenator rdf:about="http://kerry.senate.gov/"> <FullName>Kerry, John F.</FullName> <URI>http://kerry.senate.gov/</URI> <Party>Democrat</Party> <State>MA</State> <Address>304 RUSSELL SENATE OFFICE BUILDING WASHINGTON DC 20510</Address> <Phone>(202) 224-2742</Phone> <SenateClass>II</SenateClass> <ContactURI> http://kerry.senate.gov/bandwidth/contact/email.html </ContactURI> </USSenator>
Figure 1. A sample senator, in RDF.
Here's the problem: this RDF above describes John Kerry, a human being. But "USSenator" is not a "Human Being" -- make of that what you will. If someone were born a senator, and remained one for life (Strom Thurmond came close), then "USSenator" might be a fine subject in our RDF triple. But people are not their roles. If the human being John Kerry is elected president in a few weeks, he'll go from Senator to President, and my current ad-hoc RDF schema will burst into flames. People enact many roles over their lifetimes, not just one; or looked at the other way around, many roles are fulfilled by more than one person. So we need to split up roles and humans.
OK, that's not too hard; we can describe humans like this:
<Human rdf:ID="JohnKerry"> <HasRole rdf:resource="#USSenator"/> <-- Description goes here --> </Human>
Figure 2. The RDF for a human being.
And roles like this:
<Role rdf:ID="USSenator"> <-- Description goes here --> </Role>
Figure 3. The RDF for a role, in this case the role of "USSenator."
And we're home free, right? Now, in our hypothetical government-browsing application, we can generate a list of Roles, and sort people by their roles, and so forth, yes? Not really.
People vs. Roles
Let's say Kerry is elected in November, and again in 2008. Despite the fact that, when I say "John Kerry" everyone knows who I'm talking about, JohnKerry is not a unique enough identifier if we're creating data that, hopefully, will be used far in the future. Sure, we could call him JohnKerry01 and the next John Kerry could be JohnKerry02, and so forth; but what if we dig into the history of the House of Representatives and find another "John Kerry" from 1850? Do we start using negative numbers? Our numbering scheme will go seriously out of whack.
There's another problem. That rdf:ID up there? When all the namespaces get resolved, that ID is actually an HTTP URI: http://www.hackingcongress.org/ns/Politics#JohnKerry. And that opens up a can of web architecture worms, because HTTP URIs look exactly like URLs. When we see them, we expect them to point to something, and we expect to be able to dereference them. In RDF, HTTP URIs don't necessarily point to anything. They may just serve as unique identifiers, sort of like logical constants. Whether HTTP URIs should point to something or not, and variations on that theme, is a constant source of debate. It all gets to be a little much, sometimes.
URNs Aren't Just for Funerals
Enter the URN. URN stands for Uniform Resource Names. URNs are legitimate URIs, but they don't point to anything. Not only do URNs not point to anything, but they obviously don't point to anything; no one will waste time putting a URN into Firefox expecting something useful to happen. A URN looks like this: myscheme:some-unique-id. If we wanted to use a religious metaphor, we could say that HTTP URIs are like Christianity -- they show you the way to another place. URNs, on the other hand, are Zen. They don't need to point anywhere. They simply bask in the light of their own uniqueness.
Of course, URNs can point to things. For instance, the LSID URN scheme describes resources specific to the life sciences, and LSID Resolution Project is working on ways to make applications aware of LSID URNs.
URNs have one major limitation for our purposes, however: each scheme is supposed to be registered with the IETF in order to be considered a standard. Which would be a major pain, except that someone has come up with a solution: the Tag URI.
A Tag URI combines the best of both worlds: they look and act like URNs, offering a unique name for a resource that no one will try to dereference, just like a URN. But, unlike URNs and like URIs, you don't have to send off to the IETF gurus to be able to coin them legally. You can coin new Tag URIs as easily as you can coin HTTP URIs.
XML.com's editor Kendall Clark turned me on to the Tag URI. Tag URI is a very simple algorithm for creating unique identifiers. "It is simple enough," says its creators Tim Kindberg and Sandro Hawke, "to do in your head." Here's a sample Tag URI for John Kerry: tag:hackingcongress.info,2004-10-05:Kerry,John+F. Like all Tag URIs, it has six parts:
Order | Part of URI | What is it? |
1 | tag: | The URN scheme |
2 | hackingcongress.info | the tagging entity |
3 | , | a comma |
4 | 2004-10-05 | a date in ISO format |
5 | : | a colon |
6 | Kerry,John+F | a specific identifier |
Now if John Kerry has a great-grandson named John F. Kerry who is elected president in 2104, we can create a new URN for him like this: tag:hackingcongress.info,2104-10-05:Kerry,John+F, and we're home free. The sixth part of the Tag URI, the specific identifier, only has to be uniquely relevant to the date in the Tag URI. This allows us to avoid all manner of brain-bending numbering schemes.
Taking it a bit further, here are Tag URI URNs for the other two candidates:
George W. Bush
tag:hackingcongress.info,2004-10-05:Bush,George+W
Ralph Nader
tag:hackingcongress.info,2004-10-05:Nader,Ralph
Breaking All the Roles
OK, so now we have a way of naming people. So how to do we refer to political offices, that is, roles they might enact? Since there are two senators per state, we can establish Tag URIs for the Senators in New York like so:
tag:hackingcongress.info,2004-10-05:/U.S.+Senate/108th/NY/1 tag:hackingcongress.info,2004-10-05:/U.S.+Senate/108th/NY/2
Figure 4. Two Tag URIs for New York senators.
That works for New York's two senators for the 108th Congress; simply slot in the state abbreviations for the other 49 states, and you have a way of pointing to every senator in the country. Leave out the "108th" path component of the identifier, and you've named a Senatorial seat irrespective of any particular session of Congress, which is really a separate resource. Vary the session of Congress and you have a way, for the Senate, at least, to refer to every Senator's seat in every session of Congress. It's also easy to parse by eye -- which, since I'll be editing a lot of data by hand, will be very useful. Here's a Tag URI for the presidency:
tag:hackingcongress.info,2004-10-05:/U.S.+President/
Figure 5. A Tag URI for the President.
And here's one for John Kerry, a person, in his capacity as a Senator in the 108th session of Congress, a role:
tag:hackingcongress.info,2004-10-05:/U.S.+Senate/108th/MA/Kerry,John+F
Figure 6. A Tag URI for Sen. John Kerry.
All right, let's put it together, and see what we have.
<?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema" xmlns="http://www.hackingcongress.info/ns/Politics#"> <Human rdf:about = "tag:hackingcongress.info,2004-10-05:Kerry,John+F"> <FullName>Kerry, John F.</FullName> <HoldsOffice rdf:resource = "tag:hackingcongress.info, 2004-10-05:/U.S.+Senate/MA/ Kerry,John+F"/> <-- Other descriptive RDF goes here--> </Human> <OfficeHolder rdf:about = "tag:hackingcongress.info,2004-10-05:/U.S.+Senate/MA/Kerry, John+F"> <HasRole rdf:resource = "tag:hackingcongress.info,2004-10-05:/U.S.+Senate/MA/2"/> <StartDate>1985</StartDate> </OfficeHolder> <Role rdf:about = "tag:hackingcongress.info,2004-10-05:/U.S.+Senate/MA/2"> <rdfs:Label>Junior Senator from Massachussetts</rdfs:Label> </Role> </rdf:RDF>
Figure 7. RDF incorporating Tag URIs.
Now we've broken up John Kerry into many component parts, and started to give ourselves the flexibility we need in order to model reality, if only in part, in the Semantic Web. If we did the same for all Senators, living and dead, it would be possible to issue queries like: "Who were all of the junior senators of Massachusetts?" Or "When was John Kerry elected to his Senate seat?" John Kerry was never in the House of Representatives (he ran, but lost in 1972), but many Senators were previously in the House, and, with Tag URIs, we have a way to keep the distinctions clear between people, the roles they fill, and the offices they hold in order to fill those roles.
We do this for a couple of reasons: first, because people really aren't their roles and roles aren't just the people who fill them. But, second, because we never want to go back and change any RDF (unless it's wrong); we only want to add more over time. So we need to make way for that change, and while it's possible to get overly granular, this sort of breakdown makes sense, and should let us do the sorts of interesting sorting and searching of our data set that we'd like, in the future.
Eventually we'll want to further sharpen our machine-readable description of this chunk of the world by saying things like "people who hold political offices are politicians, which are subclasses of the concept of a FOAF person." More about that when we get to OWL.
Making Friends with the President
One of the nifty things about RDF is that you can throw in data that maps to other schemas, willy-nilly. In this case, since we're talking about human beings, using FOAF data is a natural match. Using Leigh Dodds' FOAF-a-Matic, I came up with some FOAF for John Kerry, and dropped it in:
<Human rdf:about="tag:hackingcongress.info,2004-10-05: Kerry,John+F"> <FullName>Kerry, John F.</FullName> <HoldsOffice rdf:resource = "tag:hackingcongress.info,2004-10-05:/U.S.-Senate/MA/ Kerry,John+F"/> <foaf:name>John Kerry</foaf:name> <foaf:title>Mr.</foaf:title> <foaf:givenname>John</foaf:givenname> <foaf:family_name>Kerry</foaf:family_name> <foaf:homepage rdf:resource="http://johnkerry.com"/> <foaf:workplaceHomepage rdf:resource = "http://senate.gov"/> <foaf:workInfoHomepage rdf:resource = "http://www.slate.com/id/1006400/"/> <foaf:schoolHomepage rdf:resource="http://yale.edu"/> <foaf:knows> <foaf:Person> <foaf:name>Theresa Heinz Kerry</foaf:name> </foaf:Person> </foaf:knows> </Human>
Figure 8. Kerry, now with FOAF.
Of course, for that RDF to parse, I'd have to add the xmlns:foaf="http://xmlns.com/foaf/0.1/" namespace declaration at the top of my XML file. Of course, FOAF isn't really intended to describe people "in the wild" -- when it's used it as intended, individuals can create their own FOAF files, and agents can collect those files and create a map of relationships between individuals. That said, there are already a number of people working on ways to explore and visualize relationships in FOAF, so perhaps we can use a subset of FOAF's predicates in order to take advantage of the tools that have already been built.
Not Necessarily the Web
So now, we've taken two steps back for last month's step forward, in order to clarify the difference between people, roles, and offices. But they were good steps to take, because there's no purpose in building RDF maps of the House or Executive Branch if the data is confused.
Looking at the shift from HTTP URIs to URNs, you could reasonably ask, "Where's the Web in this Semantic Web application?" By going with TAG URIs, the Hacking Congress data cut its link to the Web as a whole, and it is now defined in terms of itself, at least for now.
There are two points to remember. First, our RDF describing the Senate will eventually
be
on the Web itself, where it can be retrieved, queried, and so on. Second, we will
eventually
enrich this data with links to other, related resources, using RDF predicates like
foaf:homepage
and rdfs:seeAlso
.
But, looking at the bigger picture, there is, I think, an important distinction between the Semantic Web and the Web As We Know It (WAWKI). The Semantic Web is about defining data in a consistent, accurate way, so that it can be shared by machines and by humans. The WAWKI is about moving human-friendly representations of resources from one place to another, and the focus on semantic consistency (in the form of XML, XHTML, and related standards) came after the basic architecture was established. The goal of this column is not to build a "Semantic Web site" because such a thing doesn't really exist. Rather, we're aiming to build a useful knowledge base of information about a specific domain, to publish that knowledge base on the Web, so that agents, both human and machine, can use the data in ways that aids them in accomplishing their goals and plans.
When Hacking Congress returns in November, there will be a new U.S. President, barring a repeat of 2000's electoral debacle, and we'll try to see if we can fit the executive branch into our RDF plans.