Music and Metadata
November 22, 2006
Introduction
The Semantic Web: an idea that data used to describe information on the Web should be structured in such a way so it can be easily reused by different people in different ways. For music lovers or musicians, a down-to-earth, practical view of the Semantic Web is dearly needed. Now believe me when I say I have tried to find one, I really have, but in my effort to convince people who work with or in the music industry of the merits of the Semantic Web, I have – time and time again – been simply and reasonably told, "Sounds interesting, can you send me an article on it?" For musical end-to-end examples, the answer 'til now was simply "no." This article introduces the problems of Shawn, a hapless guy who has managed to get through a first date with a beautiful, charming, dance music fanatic. He accidentally met her when his mates suggested they all go down to a sleek new nightspot instead of their normal public house haunt – beer-mats and weekly fixes of the same down-to-earth people. How does the Semantic Web help this indie music fan through the turmoil of planning that second date, while knowing absolutely nothing about dance music and with less cash than he appeared to earn when they first went out? The article looks at how the solutions compare for the Web and the Semantic Web by using Semantic Web tools produced by W3C, HP, and MIT under the SIMILE project. This includes a Semantic Web browser, a screen scraper for producing Semantic Web data, and a means of consuming and using the data to help our friend. The article builds up to the ideas of a Semantic Web music browser and discusses how multimedia description formats, such as MPEG-7, could be used to augment its functionality. This information is examined from both the point of view of our hapless friend as he attempts to find a decent dance venue in London, as well as the musicians he ends up taking his date to see.
Music Events
There are many independent music events in London, where Shawn lives, and he often visits the website www.drownedinsound.com, which lists events for the coming week. This site also lists artists' names and the types of artists played by the DJs. Another site, london.openguides.org, lists great places to eat in London. Both sites list the music event or restaurant's address, and Google Maps can be used to show these places on a map. So, both a restaurant and nearby music event can easily be chosen, and, using the Web, the music event can be reserved if the restaurant date is going well. Now, on Shawn's first date, an artist's name came up in conversation. His date said something about trying to find out if Zack de la Rocha's collaborations with Reprazent and DJ Shadow were going to be released or not. Shawn ran down to his local record shop the next day to find some of this music. He asked for help from the assistant, who heard the name de la Rocha and handed Shawn Rage Against the Machine's self-titled release from 1992. Shawn was grateful for finding the CD and for the clerk's help, as he knew nothing about dance music and had accidentally given the impression on his first date that he liked the artists she mentioned, too. The website that lists local music events also lists the artists playing, addresses, and dates, and Shawn would need to know related artists as well if the few artists he'd heard of were not listed. It would also be useful to know the music genres for dance music; some of the events don't list dance music but genres such as House or Jungle. He could find all this information on the Web, which would solve his dilemma but require a lot of cross-referencing and screen time. However, our guy has left this all to the last minute, arranging to meet his date at 8 p.m. and getting off the phone only an hour and a half before they are due to meet.
So what is practically different in using the Semantic Web to solve this problem? Like the Web, the Semantic Web describes information using labels that can help to find the web pages necessary to solve problems. These normally include the title of the page and lots of other metadata, such as location information or publish date. However, unlike the Web, the relationships between these labels can be defined across multiple web pages along with other abstract information, such as people or places that do not necessarily require an actual web page, only a common label. This is like the common headings used between tables in a database, or multiple databases, to relate tables. For example, if someone has labeled an event as playing Dixieland music, and someone else has defined both Dixieland and Bebop as kinds of jazz, the Semantic Web can be used. Shawn knows that his date likes dance music, and he can use the Semantic Web to establish the relationships between the labels, which explicitly state that Dixieland and Bebop are types of jazz. The Semantic Web's power is its ability to add labels; it uses the information that Dixieland is a type of jazz to assume that the event playing Dixieland music can also be labeled as a Jazz event. This is a true statement even though the original author did not think to add this label to the event's description. This process, known as inference, is simple and adds assumed labels to web pages or information on the Semantic Web. The inference system can be incredibly compact, containing a number of rules that state: if this label and another label are both present, then an additional label can be assumed to be present when querying. Now if our indie-loving guy knew absolutely nothing about music and was looking for "The Dixie Jazz Band" in the bargain bin of his favorite local retailer to see if the band played dance music, this would be a great bit of direct information to know. However, for our quirky but otherwise lovable hero, the direct use of this assumed information generated by the Semantic Web is hard to illustrate with such an example. This is because the Semantic Web's real advantage comes with scale. The advantage for music information comes when lots of people share lots of information about music, music events, and artists. Using the inference techniques, it can then be brought together by adding assumed labels to the original information, allowing the total body of information to be reused in ways not initially intended by its individual authors. This can generate a wealth of information that would otherwise require a lot of time to obtain. Now remember to keep thinking simply; the Semantic Web allows you to assume more labels describing web pages or other information than you would normally be able to obtain in a reasonable amount of time. This would most definitely help our panicking friend.
How can we store and collect semantic data from music events and get it into a Semantic Web format without spending hours rewriting information into compatible markup languages? When the Web started, many people were writing code to convert their existing information to the new web format. Well, the same is true for the Semantic Web, in which information is generated from databases and numerous other formats. We now can use the web pages as a source of information, which doesn't require the people running the site to give open access to their database. The way to convert these sources into a Semantic Web-compatible markup is to use a screen scraper. Screen scrapers are simple programs that, in this case, use XPATH queries to extract the relevant information from web pages by navigating the DOM and external links. This raw information is then fed to a Semantic Web programming environment. This environment outputs documents comprised of RDF or OWL – two of the Semantic Web's markup languages. These documents can hold any new inferred label information as well as any that is explicitly defined. They are just like the metadata labels used at the top of current web pages, title/author, or table names and values in a database. An open source programming library that Piggy Bank (discussed later in this article) is built around, Jena (Java) is the most functional of the libraries. However, there are libraries written in C# for the more .Net-oriented users among you. These libraries can output the Semantic Web documents and also house the inference systems to process multiple documents from different sources, adding the assumed labels.
This all sounds like a lot of work to start producing Semantic Web content – especially considering that the Semantic Web is more useful as more people use it, much like the current Web. For those of us who are less technically savvy, this could be a major barrier. Fortunately, a collaborative project being undertaken by MIT, W3C and HP has written tools to simplify this task of extracting information. Collectively, these tools are published under the SIMILE project name [1].
The first part of our screen scraper requires us to write the XPATH statements that will extract the information from the relevant web pages, and in this case, the relevant JavaScript to run those XPATH queries. We can use an application called Solvent [2], from the SIMILE project, to generate these queries and code. Solvent runs under Firefox. After navigating to a web page, you can scrape information by bringing up Solvent's interface (Figure 1) and selecting an item in the web browser – such as this week's music events list in London. If this information is part of a recurring pattern of entries, such as in a table or list in the browser, Solvent writes the appropriate XPATH and JavaScript to grab the recurring entries; if not, it just grabs the highlighted section(s).
Figure 1. Solvent running in Firefox
Using this automatically generated screen scraper, you can grab the raw information from the web pages. This information can then be reused in another application known as Piggy Bank [3], again from the SIMILE project, which will eventually store the scraper's raw information into a RDF document or data store. All this is done without touching a line of code. A simplified screen scraper code for the coming week's London events is listed for drownedinsound.com in Figure 2. For information on how to install the full version into Piggy Bank, please see the SIMILE project notes [4] and the Resources section in this article.
function processEntry(d, model, utilities, uris, uriToLocation) { var elmt = d.evaluate("//div[@id='maincol']/div[@class='detail']", d, null, XPathResult.ANY_TYPE,null); var urls1 = []; var aElmt = elmt.iterateNext(); while (aElmt) { if(aElmt.innerHTML.indexOf("venue") != -1) { utilities.debugPrint(aElmt.childNodes[1]); urls1.unshift(aElmt.childNodes[1]); } aElmt = elmt.iterateNext(); } var rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"; var rdfs = "http://www.w3.org/2000/01/rdf-schema#"; var dc = "http://purl.org/dc/elements/1.1#"; var drownedinsound = "http://www.drownedinsound.com/"; var loc = "http://simile.mit.edu/2005/05/ontologies/location#"; var uri = d.location.href; model.addStatement(uri, rdf + "type", drownedinsound + "event", false); var rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"; var rdfs = "http://www.w3.org/2000/01/rdf-schema#"; var dc = "http://purl.org/dc/elements/1.1#"; var drownedinsound = "http://www.drownedinsound.com/"; var loc = "http://simile.mit.edu/2005/05/ontologies/location#"; var uri = d.location.href; model.addStatement(uri, rdf + "type", drownedinsound + "venue", false); model.addStatement(uri, loc + "address", address, true); uris.unshift(uri); } var uris = []; var urls = []; var iterator = doc.evaluate("//div[@id='maincol']/ul/li[@class='normal']/a", doc, null, XpathResult.ANY_TYPE,null) var aElement = iterator.iterateNext(); while (aElement) { urls.unshift(aElement.href); aElement = iterator.iterateNext(); } utilities.processDocuments( browser, // current browser null, // first document to process if any urls, // array of urls to load asynchronously function(d, cont) { // function to process each document as it gets loaded try { processEntry(d, model, utilities, urls, uriToLocation); } catch (e) { utilities.debugPrint(e); } cont(); // continue with the iteration }, done, // what to do when all documents have been processed function(e, url) { // error handler alert("Error scraping data from " + url + "\n" + e); } ); wait(); // don't navigate to the collected data just yet
Figure 2. Simplified screen scraper code to get this week's music listings from drownedinsound.com
This screen scraper requests web pages and the venue pages to which the events link. These venue pages list the addresses of the events being held. The code makes calls to the Semantic Web libraries and to another resource that provides a function not already mentioned. It generates the RDF data that holds the information obtained from these pages. The additional function uses the venue's postcode and address to get its longitude and latitude information. This will come in handy later (I have included the full version of the files used in this example, including the screen-scraper file, its .n3 [used in its installation], and the server-side code used to obtain the U.K. geocoordinates for the venue's address). Piggy Bank, apart from storing all this information, also displays a navigable, searchable index of the Semantic Web data, which you can share with others. Figure 3 shows how the RDF information is displayed in Piggy Bank. This was not obtained from screen scraper but directly from RDF sources from london.openguides.org.
Figure 3. Piggy Bank running in Firefox showing an Italian restaurant's RDF
description
What About Our Poor Suffering Hero?
The software tools available from the SIMILE project are good, but how can they help our poor Shawn? Well, unbeknownst to him, a plucky music lover has downloaded these two programs and put them to good use. A long-term fan of U.K.-based independent music and a regular visitor to sites that list a huge number of U.K. music events, this plucky developer, also living in London, has generated an RDF document that lists the coming week's music events using Solvent and Piggy Bank. This RDF document lists all the information available on drownedinsound.com, including the addresses, various artist information, or the artists played by the DJ(s). It then uses the geolocation service to get information detailing the longitude and latitude of the event, which could be used to find its location on a mapping service such as Google Maps. This information is consolidated into an RDF document that is published on a website. Although we have until now only been talking about RDF and seeing it through Piggy Bank, RDF can also be viewed directly, as it is an XML-derived markup language; Example 1 illustrates an RDF description of the music events listing page and the screen scraper. As you can see, this lists the genres of music being played at the event.
<?xml:namespace ns = "http://www.w3.org/RDF/RDF/" prefix ="RDF" ?> <?xml:namespace ns = "http://example.org /music/" prefix = "EM" ?> <RDF:RDF> <RDF:Description RDF:HREF = "http://uri-of-event -1"> <EM:Genre>Rock</DC:Genre > <EM:Artists>Rage against the machine</ Artists > <EM:Artists>Beastie Boys</ Artists > </RDF:Description> </RDF:RDF>
Example 1. RDF description of a music event
Shawn, interested in restaurants in London, has browsed OpenGuides – a network of free, community-maintained information that includes a guide to good food in London. Our hero has looked at some great food and bookmarked the RDF from the web pages describing various restaurants. This was done simply by clicking on a button in Firefox that was added when he installed Piggy Bank. As easy as adding a favorite to his shortcut menu, the information about the restaurant and its web address is stored because an RDF data document was referenced in the header portion of the web page by OpenGuides. This information has been added to the information in Piggy Bank already obtained from the music lover's list of London music events. Our hero can now use his Semantic Web browser, Piggy Bank – which has consumed both feeds – to search this information. The relationships between the geocoordinate labels used in the documents have also been defined by a third person, and, using Piggy Bank's inbuilt Google Maps functionality, Shawn can have them plotted on a map showing the restaurants and events for that week. How far is this from the truth? Well, all of it – apart from the inferred connection of the geolocation labels – is absolutely doable in the tools described. The relationship mapping between the longitude and latitude labels is easily doable, just not with the graphical interface to the current version of Piggy Bank; the map visualization can be plotted once the longitude and latitude are labeled under a specific tag name. This is actually achieved with the listed screen scraper, which can be used with Piggy Bank to plot this week's music events as shown in Figure 4. Defining the relationship of other labels to this label would require additional relationships to be defined, as well as the inference step, which really makes this all so much more interesting. It should be stated that this was not Piggy Bank's primary goal. Inference, as previously stated, is a simple list of rules: if this label is present, then you can assume this. In a RDF inference system, the simplest one in the Semantic Web, this consists of just 12 rules easily implemented in a programming language.
Figure 4. Google Maps used in Piggy Bank to plot this week's London music events
In fact, there is a much more functional inference engine in Jena, which Piggy Bank is built around. All the tools are present, and with the current versions of these tools, a complete framework is in place. However, these tools are now waiting for those plucky developers that were so crucial in our example.
Turning Up to a Hip-Hop Club Night, Thinking It's Going to be Dance
Following from our current example, what would happen if we knew other artists who had appeared on albums with Rage Against The Machine? This might give us a reasonable understanding that their music is similar. Many artists may not be simply related in this way, but this is pretty close to the playlist models used by Pandora [5] and Music Strands [6] or other collaborative filtering applications such as the ones used by Amazon for books. The relationship of artists through albums is easily expressed in RDF and obtainable from a site called MusicBrainz. How music genres are related to one another (music genre taxonomies) can explored in many web pages such as Wikipedia, Amazon, or MP3.com. With all this information, many new applications become possible – simply by being able to reuse the information that is currently on the Web in new ways on the Semantic Web. Again, please keep this simple: web pages and information labels are being augmented with artists, genres, and events based on what can be implicitly assumed from people's structured labels of web pages and information. A C# example of using music genre taxonomy with the Semantic Web to query music and try to improve search performance can be found here [7]. Given all this information, knowing how the genres are related does not seem so trivial a point as in our bargain bin example. Having the ability to consolidate all this information about music artists, genres, and their relationships is great, but this is the same for documents, images, or video files. It would be better to get some information related to the recorded music files themselves.
At this stage, our valiant Semantic Web trusty is walking out his front door listening to the album his unsuspecting date mentioned, thinking he is all set. In his pocket, he has the address for a great, reasonably priced Italian restaurant and a club with good reviews that plays dance music. However, the club he found had listed "Rage Against the Machine" for their hip-hop roots, not for Zack de la Rocha's collaboration with dance artists.
There is a way of describing recorded music that labels music using RDF [8]. A common format for doing this is known as MPEG-7, although the principles are the same for a number of multimedia markup languages. This has the functionality of describing music using subjective labels that rely on frequency information in the music. These descriptions can be used to describe how similar-sounding two bits of recorded music are, or even what musical genre they are likely to be from [9]. If his Semantic Web browser had been equipped with this ability, Shawn would not have sat in a reasonably priced Italian restaurant trying to convince his lovely date that he greatly enjoys the type of music she's into, and casually telling her that there is a club that plays dance music nearby if she would like to go.
MPEG-7 has a number of feature descriptors that can be used to subjectively label various attributes of music such as timbre or beat. This creates a more musically tied labeling system. Examples of MPEG-7 documents can be found here [10], and they describe, among other things, the time-frequency properties of a piece of music. Given the example sound track, the Semantic Web music browser could have pointed out that Shawn was actually looking for hip-hop events, not dance as he specified, and suggested some good events in the area. These same features have also been used by people to construct music similarity (MusicIP [11]), and music fingerprinting (Shazam [12], MusicBrainz), allowing two music files to be compared or uniquely identified. If this information is fed into an inference system, like the Semantic Web, it leads to some very useful labels being added to all kinds of music information. In our case, this would mean that his date would have had a much better time that night than she did politely listening to hip-hop. The music could have been good, but the musician and DJs had rushed together a set after spending far too many hours looking for a break for their new song the record label needs next month.
Musician Problems
We have been concentrating on the needs of someone trying to find music events. However, musicians can also benefit from the Semantic Web, especially if it is equipped with MPEG-7 style descriptors. In our example, the musicians are rushed because they had problems finding a break. Normally, this process consists of trawling vast audio sample libraries, both on and offline. However, every time a break is found that is close to the one being sorted, it is invariably the right type of rhythm but the wrong type of sound, or the right type of sound and the wrong rhythm or groove. This, then, requires starting your search all over again. With the feature descriptors of MPEG-7, it is possible to state that you are looking for something that sounds similar to one break but has a different groove type or rhythm, narrowing your search in each step. If online sample stores stored their music with an MPEG-7 document and the inferences that are possible with the Semantic Web when samples are labeled with common information (such as the music style the sample is best used for and the creator), then our rushed musicians would have put on a much better show for our friends.
Does He Get That Third Date?
How far off are we from this really useful community-driven tool for music? Well, I'm sitting here now seeing all the building blocks and tools, wanting to implement a Semantic Web music browser – hopefully like so many of you are. However, I do not see any evidence out there of anyone doing this, so if anyone wants to give me a hand, that would be just great. If you like the ideas of the Semantic Web, have programming skills, and love music, why not join me in a open source project and get music into the Semantic Web – if only to help out our hapless friend on his third date.
References
- SIMILE is focused on developing robust, open source tools based on Semantic Web technologies that improve access, management, and reuse among digital assets.
- Solvent is a Firefox extension that helps you write screen scrapers for Piggy Bank.
- Piggy Bank is a Firefox extension that turns your browser into a mashup platform by allowing you to extract data from different websites and mix them together.
- Piggy-Bank, Installing screen scrapers
- Pandora
- Music Strands
- C# Semantic Web and music genre taxonomies
- RDF MPEG-7
- Music Similarity application using MPEG-7
- MPEG-7 examples
- MusicIP
- Shazam