
Never Mind the Namespaces: An XSLT RSS Client
by Bob DuCharmeJanuary 02, 2003
RSS is an XML-based format for summarizing and providing links to news stories. If you collect RSS feed URIs from your favorite news sites, you can easily build dynamic, customized collections of news stories. In a recent XML.com article Mark Pilgrim explained the history and formats used for RSS. He also showed a simple Python program that can read RSS files conforming to the three RSS formats still in popular use: 0.91, 1.0, and 2.0. While reading Mark's article I couldn't help but think that it would be really easy to do in XSLT.
Easy, that is, if you're familiar with the XPath local-name() function. In a past column I showed how this function retrieves the part of an element name that identifies it within its namespace. For example, an element with a qualified name of "blue:verse" has the local name "verse" (and not "blue", as I wrote in a typo in that column and only just now caught; "blue" is the namespace prefix).
Typical XSLT stylesheets care a great deal about an element's namespace. If a channel element in an RSS 1.0 file comes from the http://purl.org/rss/1.0/ namespace and a channel element from an RSS 2.0 file comes from the http://purl.org/dc/elements/1.1/ namespace, then an XSLT processor considers these two element types to be as different as a title element from a book publishing namespace and a title element from a human resources namespace. However, by basing match conditions (and, as we'll see later, select tests in xsl:apply-templates instructions) on the local name of source tree elements, we can explicitly tell the XSLT processor to ignore the namespace of certain elements. For example, we can have a template rule that applies to all elements with a local name of "channel," regardless of their namespace.
The following stylesheet mimics the behavior of the rss1.py Python program in Mark's article:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:dc="http://purl.org/dc/elements/1.1/" version="1.0">
<xsl:output method="text"/>
<xsl:template match="*[local-name()='title']">
<xsl:text>title: </xsl:text>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="*[local-name()='link']">
<xsl:text>link: </xsl:text>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="*[local-name()='description']">
<xsl:text>description: </xsl:text>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="dc:creator">
<xsl:text>author: </xsl:text>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="dc:date">
<xsl:text>date: </xsl:text>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="language"/> <!-- suppress -->
</xsl:stylesheet>
![]() |
Essential Reading What Are Syndication Feeds
Table of Contents
Syndication feeds have become a standard tool on the Web. But when you enter the world of syndicated content, you're often faced with the question of what is the "proper" way to do syndication. This edoc, which covers Atom and the two flavors of RSS--2.0 and 1.0--succinctly explains what a syndication feed is, then gets down to the nitty-gritty of what makes up a feed, how you can find and subscribe to them, and which feed will work best for you. Read Online--Safari Search this book on Safari: |
There is one slight difference: it doesn't print the "date:" and "author:" headers for news items that have no dc:creator or dc:date children. RSS 0.91 doesn't use these two Dublin Core elements. The first template rule in this stylesheet has an asterisk and a predicate inside of square braces to specify that the XSLT engine should apply that rule to any element meeting the predicate condition: its local name is "title." The second and third template rules use a similar format to handle the RSS link and description elements.
I won't show the input and output for this stylesheet: they're essentially the same as the input and output in Mark's article. Instead, I'd rather take the stylesheet a few steps further to create a standalone news aggregator that requires no special software other than a web browser and an XSLT processor.
Three basic XSLT techniques make this possible:
- Most XSLT processors can read remote documents using XSLT's document() function; our stylesheet will use it to retrieve the news feeds from their servers.
- Converting the RSS elements and attributes to HTML for display by the browser.
- Using the local-name() function to create template rules that don't care about the namespace of RSS elements such as channel, item, and link.
There are plenty of RSS-based news aggregating clients around: Amphetadesk, NewzCrawler, NetNewsWire, among many others. The advantage of using one written in XSLT means that you don't have to install new software on your machine or login to a server-based aggregator that needs to look up a list of your favorite feeds. You can also more easily integrate the XSLT-based one into other applications -- for example, to add customized news feeds to your company's intranet site without relying on any software more expensive or exotic than an XSLT processor.
Our stylesheet will transform the following XML document, which links to summaries of several news feeds and blogs:
<?xml-stylesheet href="getRSS.xsl" type="text/xsl"?> <RSSChannels> <!-- RSS 0.91 feeds --> <RSSChannel src="http://www.xml.com/cs/xml/query/q/19"/> <RSSChannel src="http://xml.coverpages.org/covernews.xml"/> <RSSChannel src="http://www.bbc.co.uk/syndication/feeds/news/ukfs_news/world/rss091.xml"/> <!-- RSS 1.0 feeds --> <RSSChannel src="http://www.ilrt.bristol.ac.uk/discovery/rdf/resources/rss.rdf"/> <RSSChannel src="http://www.smartmobs.com/index.rdf"/> <RSSChannel src="http://www.infoworld.com/rss/news.rdf"/> <!-- RSS 2.0 feeds --> <RSSChannel src="http://www.panix.com/~jbm/snappy/index.xml"/> <RSSChannel src="http://www.antipixel.com/blog/index.xml"/> <RSSChannel src="http://revjim.net/index.xml"/> </RSSChannels>
As the document's comments tell us, it includes feeds from the three currently popular RSS formats. For now, most feeds using RSS 2.0 come from webloggers interested in playing with the latest technology, but I'm sure we'll see more commercial sites take advantage of the richer metadata possibilities offered by the post-0.91 releases.
The processing instruction in the document's first line identifies the stylesheet to use for dynamic rendering in a web browser. Before looking at how the stylesheet works, first watch it in action: unzip this file onto your hard disk and use a recent release of Internet Explorer to open RSSChannels.xml. There are a few caveats to remember:
- This doesn't work with Mozilla, which, as of release 1.2.1, still has some kinks in its implementation of the document() function.
- I'd hoped to put the XML file and its stylesheet on a public server so that you could just link to it from this article to see it in action, but I got an "Access denied" message when the stylesheet tried to use the document() function to retrieve a document from a different server. This could be a security precaution in IE's XSLT implementation.
Using IE to open up local copies of RSSChannels.xml and its accompanying getRSS.xsl stylesheet should work fine. A batch file or shell script can also use Xalan or Saxon and these two files to create an HTML file that any web browser can read. So, these caveats won't stand in the way of anyone developing their own XSLT RSS client -- they just get in the way of the flashy demo that I had originally planned.
Let's look at the getRSS.xsl stylesheet.
<!-- getRSS.xsl: retrieve RSS feed(s) and convert to HTML. -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:dc="http://purl.org/dc/elements/1.1/" version="1.0">
<xsl:output method="html"/>
<xsl:template match="RSSChannels">
<html><head><title>Today's Headlines</title></head>
<style><xsl:comment>
p { font-size: 8pt;
font-family: arial,helvetica; }
h1 { font-size: 12pt;
font-family: arial,helvetica;
font-weight: bold; }
a:link { color:blue;
font-weight: bold;
text-decoration: none; }
a:visited { font-weight: bold;
color: darkblue;
text-decoration: none; }
</xsl:comment></style>
<body>
<xsl:apply-templates/>
</body></html>
</xsl:template>
<xsl:template match="RSSChannel">
<xsl:apply-templates select="document(@src)"/>
</xsl:template>
<!-- Named template outputs HTML a element with href link and RSS
description as title to show up in mouseOver message. -->
<xsl:template name="a-element">
<xsl:element name="a">
<xsl:attribute name="href">
<xsl:apply-templates select="*[local-name()='link']"/>
</xsl:attribute>
<xsl:attribute name="title">
<xsl:apply-templates select="*[local-name()='description']"/>
</xsl:attribute>
<xsl:value-of select="*[local-name()='title']"/>
</xsl:element>
</xsl:template>
<!-- Output RSS channel name as HTML a link inside of h1 element. -->
<xsl:template match="*[local-name()='channel']">
<xsl:element name="h1">
<xsl:call-template name="a-element"/>
</xsl:element>
<!-- Following line for RSS .091 -->
<xsl:apply-templates select="*[local-name()='item']"/>
</xsl:template>
<!-- Output RSS item as HTML a link inside of p element. -->
<xsl:template match="*[local-name()='item']">
<xsl:element name="p">
<xsl:call-template name="a-element"/>
<xsl:text> </xsl:text>
<xsl:if test="dc:date"> <!-- Show date if available -->
<xsl:text>( </xsl:text>
<xsl:value-of select="dc:date"/>
<xsl:text>) </xsl:text>
</xsl:if>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
Even with whitespace and comments, the whole thing is less than 80 lines. It has five template rules:
- The first is for the root RSSChannels element of the main document that holds the RSS feed URIs. It does the basic setup of the result HTML document, including the addition of a CSS stylesheet.
- The short second template rule acts on an RSSChannel element, using the XSLT document() function to read in the document named by the element's src attribute. The stylesheet assumes that the document being read is an RSS document, and the stylesheet uses the remaining three template rules to transform the elements of the RSS document read in by the document() function into HTML.
- The third template rule's xsl:template element has a name attribute instead of a match attribute, making it a named template rule that must be explicitly called from a template rule. Because the fourth and fifth template rules surround their result contents with an HTML a element of a similar structure, the common code is stored in this named template. Note how the xsl:apply-templates instruction uses the local-name() function to selectively identify which element types to use for attribute values in the result.
- The fourth template rule outputs the name of an RSS channel --
typically, the title of the news channel such as "XML.com" or "InfoWorld:
Top News" -- as an HTML h1 element. The h1 element wraps
an a element that links back to the main page of the site using
the URI named in the channel element's link child
element. The a element includes the description of the channel in
a title element so that when the resulting HTML is displayed
using recent releases of Internet Explorer, Mozilla, or Opera, a
mouseOverevent displays that description in a pop-up box. The actual a element is output with a call to the "a-element" named template. - The last template rule outputs an HTML p element containing a link to a particular news item. It uses the RSS item element's link and description child elements the same way that the preceding template rule does, which is why the creation of the a element with these attributes was moved to a separate template rule that these two both call. This final template rule adds one more bit of information: if a dc:date element is supplied with the news item, the template rule adds that to the result tree as plain text.
Ill-formed RSS?One word of caution: as Mark mentioned in his article, not all RSS feeds are well-formed XML, and anything that you load into a source tree for XSLT processing must be well-formed XML. To process ill-formed RSS, you'll have to go beyond XSLT, and Mark will explain some strategies for that in a follow-up piece. In my research, I found very little ill-formed RSS, so this hasn't been a problem for me. |
On December 31st I used Saxon to apply this stylesheet to the
RSSChannels document shown above and created an HTML result version that
you can see
here. (Don't forget to
try the mouseOvers...) If I applied the same stylesheet to
the same XML document at a later date, the result would be different, with
more up-to-date news. That's the beauty of RSS.
The actual HTML and CSS that I used create a pretty stark layout. Some simple additions to the stylesheet could add some glitz to the resulting appearance, but despite its visual simplicity, this stylesheet still does a great deal: it retrieves a customized set of news feeds listed in a simple, easily customizable file, and then displays a menu of the news items where you can see their titles, read their descriptions, and then follow the links to the actual stories. You could modify the layout to make it fancier, or you could modify it to make it simpler -- slight modifications will let you convert the RSS to WML, plain text delivery, or some new markup language being developed for new output devices. XSLT helps you grab these RSS feeds; what you do with them is up to you.
Modify the stylesheet to your heart's content and change the URIs in the RSSChannels document as well. You can find a wide choice of feeds to choose from at WebReference.com, Alternative News on the Web, Yahoo's RSS News Aggregators category, and the massive news4sites list. Happy aggregating!
Have you used XSLT to manipulate RSS? Share your experience in our forums.
(* You must be a member of XML.com to use this feature.)
Comment on this Article
| Titles Only | Titles Only | Newest First |
- Help
2007-11-15 12:43:06 vpolite [Reply]
I'm new to XML and need to do a newsreader. Forgive me my errors. How do I put this client in a HTML page
- Help
2007-11-15 12:49:38 Bob DuCharme [Reply]
You don't put it in an HTML page; it creates HTML pages, so you incorporate it into whatever tool is creating your HTML pages.
- Help pt.2
2007-11-15 17:34:07 vpolite [Reply]
I modified this page for my own use, well governmental use. The link to the actual xml is: http://yosemite.epa.gov/opa/admpress.nsf/RSSByLocation?open&location=Headquarters
I get an error message that there is a missing"';'. " in the area of the '&location'. Any idea what that could mean? Thanks for helping a newbie along.
- Help pt.2
2007-11-15 17:37:47 Bob DuCharme [Reply]
It thinks that "location" is an entity reference for which you forget the closing delimiter. If any of that doesn't make sense, then you need to learn some more basic aspects of XML before you get too far into an application like that.
I think that the http://www.mulberrytech.com/xsl/xsl-list/ mailing list would be a better place for such questions.
- Help pt.2
2007-12-17 13:41:28 SVanya [Reply]
What I found to work is to reorganize the xml to make the "src" a child of RSSChannel and then wrap it as a CDATA and then have the xsl applied to ./src instead of @src. This has an added benefit (and how I found it) in that it can now read ampersands, such a google (news) searches, etc., that don't end in .xml or .rss Thank you, Bob, for this! Note: it also helps to drop it into a frameset and have the a-link point to the right frame. Ciao!
- Help pt.2
- Help pt.2
- Help pt.2
- Help
- Sharepoint WSS 3.0 & RSS client
2007-08-09 02:18:09 pdigs [Reply]
Hi,
Sharepoint MOSS has a built-in RSS viewer - however, the free Sharepoint (WSS 3.0) does not.
I used your xslt RSS client to make an RSS viewer for Sharepoint WSS 3.0.
I ran it using javascript on the client in IE6.
It works great.
Thanks for sharing your efforts.
-KR, Pdigs
- Saxon through the firewall
2003-10-22 13:03:31 Scott Hudson [Reply]
For those trying to process the above examples from the command line behind a firewall, try:
java -Dhttp.proxyHost=servername -Dhttp.proxyPort=8080 com.icl.saxon.StyleSheet RSSChannels.xml getRSS2.xsl > headlines.html
where servername is your favorite proxy host...
- Aggregation versus merely displaying
2003-01-09 08:58:12 Damian Cugley [Reply]
Displaying RSS feeds organized by their originator is simple enough; that's what my.netscape.com did in the olden days. Displaying them organized by DATE is harder. Much harder, because RSS 0.91 does not include per-item date fields at all!
The only solution to this is to poll RSS feeds periodically and see if you can spot additions; this way you can tag items with the date you discovered them, which is approximatly the same as their publication date.
This does not really work if items can be expected to change, since in that case you don't have any deterministic way to work out if a given item is new or is an old item edited. Giving items their own permanent URIs helps, but that is also not an RSS 0.91 feature, alas!
On top of this, regular polling makes for bandwidth issues. Your program had better know about caching and ETags and If-Modified-Since and the various which-RSS-has-changed-recently services if it is to be a good WWW citizen.
All of this means that a really good RSS aggregator has to be a fairly tricky piece of software. Don't get me wrong, XSLT is a great tool (and I use it every day) but RSS as a format has too many features that defeat any elegant solution... :-(
- Re: RSS xslt, not quite so simple
2003-01-08 13:42:23 Bob DuCharme [Reply]
Well, it's not trying to be a complete all-around RSS processor, pluggable into other apps; it pulls out information that it reasonably expects to find and uses it to create something useful.
I missed Mike Champion's Wednesday talk at XML 2002 (see http://www.xmlconference.org/xmlusa/2002/wednesday.asp#19 ; the slides are somewhere on that site, but I can't find them right now) but I really liked how his accompanying paper (on the conference CD) used RSS as an example of how much can get done when by concentrating on the needs of the receiving side of an XML transmission. My needs were similar to those of a lot of RSS use, and very simple: I wanted to display, in a web browser, story titles with descriptions and links to the stories from a variety of providers. It won't work with all the RSS out there, but it works with enough RSS to make it useful.
I could have made it even simpler (left out the date, etc.) and certainly could have made it richer and more bullet-proof (use more RSS information, do more error-checking, etc.). I hope that in its present state it will give a head start to XSLT developers who want to take advantage of RSS feeds.
Bob
- RSS xslt, not quite so simple
2003-01-08 00:09:59 bryan rasmussen [Reply]
Although I agree that local-name() is a help when dealing with the whole RSS mess, this stylesheet just scratches the surface, this is no doubt it's purpose, but as a surface scratcher it doesn't seem to me that you can note how little code it took when there is quite a bit more to be done.
For example, when you do an if test="dc:date" under item, if you actually wanted to check about the date you would have to support pubDate in RSS 2.0, dc:date in RSS 1.0 and it looks like nothing in the earlier versions.
I suppose the best thing would be choose, when dc:date use dc:date otherwise use pubDate.
That's a quibble I suppose but the fact that your strategy here depends on skipping the description of an item can be seen as an indication of where XSLT is going to have problems, since description in certain RSS versions can have escaped html inside of it.
As can be seen here:
http://static.userland.com/gems/backend/rssTwoExample2.xml
most of the time, as in the example above this is to put in a link so it would be possible, if tedious, to parse out the a tags as text and create actual links from them, however in some RSS feeds, I'm thinking of that of Jon Udell, description is filled with so much variant html it's pretty much inconceivable that one should want to work with it.
As you indicated people can use the RSS as they want, but once one gets beyond the tagset you have chosen to work with here RSS is something that quickly can cause an increase in xslt code, despite the relative simplicity of any individual RSS spec.

