RSS and AJAX: A Simple News Reader

September 13, 2006

An Ajax RSS Parser

Ajax (Asynchronous JavaScript And XML) and RSS (Really Simple Syndication) are two technologies that have taken the Web by storm. Most commonly, RSS is used to provide news to either people or other organizations. This is done by serving an "RSS feed" from a website. An RSS feed is simply a link to an XML file that is structured in a certain way. The RSS specification tells us the expected structure of the XML file. For example, the title, author, and description tags are required, and so all RSS XML files will have at least these three tags.

The RSS specification that we will be using is 2.0, which is both the newest and most widely used of the three specifications (0.98, 1.0, and 2.0). Fortunately, RSS 2.0 is far less complex than RSS 1.0, so you can quickly familiarize yourself with RSS 2.0 here: blogs.law.harvard.edu/tech/rss. If you want a comprehensive introduction to RSS, covering all three specifications, go here: www.xml.com/pub/a/2002/12/18/dive-into-xml.html.

Why are we using Ajax to parse our RSS? By using Ajax, we are passing over the work of processing the RSS XML file to the web browser, thus reducing server load. Also, Ajax allows the user to have a more seamless web experience, because we are able to fetch the entire RSS XML file from the server without having to refresh the page. Lastly, Ajax is designed to handle XML files, so it's able to parse RSS in a simple and elegant way.

For the purposes of this article, you don't need to be familiar with Ajax; however, a basic understanding of JavaScript is strongly recommended.

Here's how the parser is going to work: first, the file name of the RSS feed is selected in an HTML form. Once the user clicks Submit, the getRSS() function is called. This function is responsible for fetching the specified RSS XML file from the server. Once it's fetched successfully, processRSS() converts the received XML file into a JavaScript object. Finally, showRSS(RSS) is called, which displays some of the information contained in the RSS JavaScript object to us by updating the HTML page. The diagram below summarizes these steps:

Figure 1. General design

The HTML File

To begin, we'll have a look at the HTML file. The top half (the form element) determines which RSS feed to fetch, while the bottom half (the root div element) is used to display the information contained in the RSS JavaScript object.

<html>

<head>

    <!--B-->

    <script language="javascript" src="rssajax6.js"></script>

    <!--C-->

    <style type="text/css">

        #chan_items { margin: 20px; }

        #chan_items #item { margin-bottom: 10px; }

        #chan_items #item #item_title { font-weight: bold; }

    </style>

</head>

<body>

    <!--A-->

    <form name="rssform" onsubmit="getRSS(); return false;">

        <select name="rssurl">

            <option value="test-rss.xml">test RSS feed</option>

            <option value="google-rss.xml">google RSS feed</option>

        </select>

        <input type="submit" value="fetch rss feed" />

    </form>



    <div id="chan">

        <div id="chan_title"></div>

        <div id="chan_link"></div>

        <div id="chan_description"></div>

        <a id="chan_image_link" href=""></a>

        <div id="chan_items"></div>

        <div id="chan_pubDate"></div>

        <div id="chan_copyright"></div>

    </div>

</body>

</html>

For now, we will ignore most of the HTML and focus on the form element (labeled  above). The names of the RSS XML files are specified in the value attributes of the option tags of the select element. The user selects one of these files, and then submits the form. The JavaScript that starts the whole process is found in the onsubmit tag. After calling the JavaScript function, we add return false to prevent the entire form from being sent to the server the "conventional" way. If we'd omitted return false, the entire page would refresh and we'd lose all the data that was fetched via Ajax. One last thing: note that the JavaScript code is included in the header as a reference to a separate file (labeled ). In case you're wondering, the contents of the <style> tag (labeled ) tell the browser how to display the RSS data when it's written to the HTML page by the showRSS(RSS) function.

Fetching the XML from the Server: The `getRSS()` Function

The getRSS() function is shown below:

function getRSS()

{

    /*A*/

    if (window.ActiveXObject) //IE

        xhr = new ActiveXObject("Microsoft.XMLHTTP");

    else if (window.XMLHttpRequest) //other

        xhr = new XMLHttpRequest();

    else

        alert("your browser does not support AJAX");



    /*B*/

    xhr.open("GET",document.rssform.rssurl.value,true);



    /*C*/

    xhr.setRequestHeader("Cache-Control", "no-cache");

    xhr.setRequestHeader("Pragma", "no-cache");



    /*D*/

    xhr.onreadystatechange = function() {

        if (xhr.readyState == 4)

        {

            if (xhr.status == 200)

            {

                /*F*/

                if (xhr.responseText != null)

                    processRSS(xhr.responseXML);

                else

                {

                    alert("Failed to receive RSS file from the server - file not found.");

                    return false;

                }

            }

            else

                alert("Error code " + xhr.status + " received: " + xhr.statusText);

        }

    }



    /*E*/

    xhr.send(null);

}

Let's walk through this function step by step and figure out what it's doing. The labels in the code (e.g. /*A*/) refer to the corresponding explanations below.

A: In order to communicate with the server, we must first define an XMLHttpRequest object (XHR). This object is what allows us to connect to the server without refreshing the browser; it is the core of any Ajax application. As we'd expect, Internet Explorer defines the XHR object differently than other browsers, such as Firefox and Safari. We use object detection to determine what browser the JavaScript is running in, and thus define the XHR object appropriately.

B: We set up the XHR object by calling xhr.open(). This function takes three arguments: the first is the method we use to fetch the file from the server, the second is the name of the file we are fetching, and the third is set to true if we want the response to be received asynchronously. As we are not going to send any request data to the server, a GET request is sufficient. The name of the RSS XML file is taken directly from the HTML form. Lastly, we specify an asynchronous response so that we don't have to "wait" for the file to be received--we know when it is available by defining a function that is called when the receiving is complete (more on this later).

C: It is important to request a fresh (non-cached) copy from the server. Here, we set the request header to ensure that this will be the case (Pragma is included for backward compatibility).

D: As we have set up an asynchronous connection with the server, the XHR object needs to know what function to call when the server response becomes available. This is the purpose of the onreadystatechange property of the XHR object--we set it equal to the function that we want to run when the readyState property of the XHR object changes. For our purposes, we only need to be concerned with the readyState of 4, because this indicates that the response is available. We know that we received a successful response if xhr.status is equal to 200. Any other status code means that we didn't receive the response properly.

E: We have set up the XHR object for receiving a response from the server. Now, we call xhr.send() to run the server request/response process. As we don't have any data to send, we pass an argument of null.

F: If we have reached this point, we know that a response from the server was successfully received, and we're ready to process the XML file. We just do one last check to make sure we didn't get an empty response. We do so by inspecting the responseText property (this contains the received file in textual format). Now we're ready to call the processRSS() function.

Parsing the XML: The `processRSS()` Function and the `RSS2Channel` Object

The processRSS() function is shown below:

function processRSS(rssxml)

{

    RSS = new RSS2Channel(rssxml);

    showRSS(RSS);

}

This function simply calls the constructor of the RSS2Channel object and passes rssxml. This argument is of special significance, as it contains all of the RSS information. Moreover, JavaScript is able to recognize this as an XML object, and therefore we are able to use JavaScript's built-in DOM (Document Object Model) functions and properties on it. We can do this because we used the responseXML attribute of the XHR object to get the server response. If we had used responseText, parsing the XML would be much more difficult.

Now we'll examine the RSS2Channel object. Each RSS XML file always has exactly one channel element--this element contains all of the RSS data. As you would expect, this data is organized into a number of sub-elements, or "child" elements. Therefore, channel is the root element of an RSS XML file, which is represented by the RSS2Channel object. This object is shown below:

function RSS2Channel(rssxml)

{

    /*A*/

    /*required string properties*/

    this.title;

    this.link;

    this.description;



    /*optional string properties*/

    this.language;

    this.copyright;

    this.managingEditor;

    this.webMaster;

    this.pubDate;

    this.lastBuildDate;

    this.generator;

    this.docs;

    this.ttl;

    this.rating;



    /*optional object properties*/

    this.category;

    this.image;



    /*array of RSS2Item objects*/

    this.items = new Array();



    /*B*/

    var chanElement = rssxml.getElementsByTagName("channel")[0];

    var itemElements = rssxml.getElementsByTagName("item");



    /*C*/

    for (var i=0; i<itemElements.length; i++)

    {

        Item = new RSS2Item(itemElements[i]);

        this.items.push(Item);

    }



    /*D*/

    var properties = new Array("title", "link", "description", "language", "copyright", "managingEditor", "webMaster", "pubDate", "lastBuildDate", "generator", "docs", "ttl", "rating");

    var tmpElement = null;

    for (var i=0; i<properties.length; i++)

    {

        tmpElement = chanElement.getElementsByTagName(properties[i])[0];

        if (tmpElement!= null)

            eval("this."+properties[i]+"=tmpElement.childNodes[0].nodeValue");

    }



    /*E*/

    this.category = new RSS2Category(chanElement.getElementsByTagName("category")[0]);

    this.image = new RSS2Image(chanElement.getElementsByTagName("image")[0]);

}

As before, we will break the code into smaller pieces and explain each one individually.

A: As a guide, we list out all of the properties that we will be assigning values to. Each of these properties corresponds to an RSS XML element. For example, we will set this.language equal to the string found inside the <language>en-us</language> XML tag--in this case, en-us. Some of these properties will be custom objects, just as RSS2Channel. This will be explained in more detail shortly.

B: Here, we create two variables--one to store the contents of the channel element, and another to store an array of item elements. To accomplish this, we use the getElementsByTagName() function, which returns an array of all the elements in the XML file that match a specified tag name. As previously discussed, an RSS XML file only has one channel tag, so we expect an array with one element to be returned. We add [0] to the end of the function call to get the object and assign it to chanElement. On the other hand, we need itemElements to be an array, because an RSS XML file will have multiple <item> tags.

C: This loop traverses the itemElements array and parses each item element individually. An <item> tag in an RSS XML file contains a number of child tags, so we need to construct an RSS2Item object that will store this data in a meaningful way. We pass the current item element to the constructor, and assign the constructed object to Item. This is then added to the this.items array. Once this loop is complete, the items property of the RSS2Channel object will contain an array of custom RSS2Item objects. We will talk about the RSS2Item object once we're done with RSS2Channel.

Use of the `eval()` Function

Before I continue, I wanted to briefly explain the eval() function, in case you're unfamiliar with it. This function takes a single argument, which is a string containing the JavaScript code that you want your program to run. For example, eval('return true') is identical to return true. As you will see, this function is useful when dealing with objects that have a large number of properties.

D: We will now set all of the object properties that take simple strings as their values. As all of these properties are grabbed from the chanElement object in the same way, we define an array containing the names of all the properties we want to set, and traverse the array using a for loop. To get the actual string value of the XML tag we are examining, we access two properties: childNodes and nodeValue. The first property exposes all of the child XML elements in the form of an array of objects, while the second property gets the actual string value of the XML element. In the case of the properties being retrieved here, they do not contain any child XML tags, so only one element is returned by childNodes. Then, nodeValue gets the string value of the element in childNodes[0].

E: Finally, we set the this.category and this.image properties. Unlike the properties discussed in D, these do have child tags, so we have to construct custom objects for these XML elements (RSS2Category and RSS2Image, respectively). Let's have a look at the RSS2Category function to start:

function RSS2Category(catElement)

{

    if (catElement == null) {

        this.domain = null;

        this.value = null;

    } else {

        this.domain = catElement.getAttribute("domain");

        this.value = catElement.childNodes[0].nodeValue;

    }

}

This is a simple object with two properties: domain and value. The value property contains the actual contents of the XML tag, while the domain property is set to the contents of the XML domain tag attribute. For example, a typical category XML element looks like this: <category domain="Syndic8">1765</category>. In this case, this.domain is set to Syndic8 and this.value is set to 1765. In order to get the domain attribute from the XML tag, we use the function getAttribute() and pass the tag attribute we want to fetch as a parameter (in this case, domain).

As the image tag in an RSS XML file has only attributes, the RSS2Image constructor makes use of the getAttribute() function extensively.

function RSS2Image(imgElement)

{

    if (imgElement == null) {

    this.url = null;

    this.link = null;

    this.width = null;

    this.height = null;

    this.description = null;

    } else {

        imgAttribs = new Array("url","title","link","width","height","description");

        for (var i=0; i<imgAttribs.length; i++)

            if (imgElement.getAttribute(imgAttribs[i]) != null)

                eval("this."+imgAttribs[i]+"=imgElement.getAttribute("+imgAttribs[i]+")");

    }

}

Now we'll go onto the last remaining property in the RSS2Channel object: items, which contains an array of RSS2Item objects. The code for this object is shown below:

function RSS2Item(itemxml)

{

    /*A*/

    /*required properties (strings)*/

    this.title;

    this.link;

    this.description;



    /*optional properties (strings)*/

    this.author;

    this.comments;

    this.pubDate;



    /*optional properties (objects)*/

    this.category;

    this.enclosure;

    this.guid;

    this.source;



    /*B*/

    var properties = new Array("title", "link", "description", "author", "comments", "pubDate");

    var tmpElement = null;

    for (var i=0; i<properties.length; i++)

    {

        tmpElement = itemxml.getElementsByTagName(properties[i])[0];

        if (tmpElement != null)

            eval("this."+properties[i]+"=tmpElement.childNodes[0].nodeValue");

    }



    /*C*/

    this.category = new RSS2Category(itemxml.getElementsByTagName("category")[0]);

    this.enclosure = new RSS2Enclosure(itemxml.getElementsByTagName("enclosure")[0]);

    this.guid = new RSS2Guid(itemxml.getElementsByTagName("guid")[0]);

    this.source = new RSS2Source(itemxml.getElementsByTagName("source")[0]);

}

The RSS2Item object is similar to RSS2Channel in many ways. We start by listing out the properties that we will be retrieving (A). We then loop through the string properties, and assign each to the contents of its associated XML tag (B). Lastly, we set object properties by calling the appropriate custom object constructor--in each case, passing the XML element that contains the relevant data (C).

The custom objects that are found in the RSS2Item object are listed below. They are similar to the RSS2Category and RSS2Image objects, and they don't use any functions or properties that haven't been discussed earlier.

function RSS2Enclosure(encElement)

{

    if (encElement == null) {

        this.url = null;

        this.length = null;

        this.type = null;

    } else {

        this.url = encElement.getAttribute("url");

        this.length = encElement.getAttribute("length");

        this.type = encElement.getAttribute("type");

    }

}



function RSS2Guid(guidElement)

{

    if (guidElement == null) {

        this.isPermaLink = null;

        this.value = null;

    } else {

        this.isPermaLink = guidElement.getAttribute("isPermaLink");

        this.value = guidElement.childNodes[0].nodeValue;

    }

}



function RSS2Source(souElement)

{

    if (souElement == null) {

        this.url = null;

        this.value = null;

    } else {

        this.url = souElement.getAttribute("url");

        this.value = souElement.childNodes[0].nodeValue;

    }

}

Now that we've fully defined our RSS object, we can move on to the last step: displaying its actual content.

Displaying the RSS Data: The `showRSS(RSS)` Function

Before we go into the JavaScript code for the showRSS(RSS) function, let's have a look at the root div element of the HTML page mentioned earlier:

    <div class="rss" id="chan">

        <div class="rss" id="chan_title"></div>

        <div class="rss" id="chan_link"></div>

        <div class="rss" id="chan_description"></div>

        <a class="rss" id="chan_image_link" href=""></a>

        <div class="rss" id="chan_items"></div>

        <div class="rss" id="chan_pubDate"></div>

        <div class="rss" id="chan_copyright"></div>

    </div>

As you can see, the root div element has a number of child div tags. These tags will be populated with the data in the RSS object by the showRSS(RSS) function, which is shown below.

function showRSS(RSS)

{

    /*A*/

    var imageTag = "<img id='chan_image'";

    var startItemTag = "<div id='item'>";

    var startTitle = "<div id='item_title'>";

    var startLink = "<div id='item_link'>";

    var startDescription = "<div id='item_description'>";

    var endTag = "</div>";



    /*B*/

    var properties = new Array("title","link","description","pubDate","copyright");

    for (var i=0; i<properties.length; i++)

    {

        eval("document.getElementById('chan_"+properties[i]+"').innerHTML = ''"); /*B1*/

        curProp = eval("RSS."+properties[i]);

        if (curProp != null)

            eval("document.getElementById('chan_"+properties[i]+"').innerHTML = curProp"); /*B2*/

    }



    /*C*/

    /*show the image*/

    document.getElementById("chan_image_link").innerHTML = "";

    if (RSS.image.src != null)

    {

        document.getElementById("chan_image_link").href = RSS.image.link; /*C1*/

        document.getElementById("chan_image_link").innerHTML = imageTag

            +" alt='"+RSS.image.description

            +"' width='"+RSS.image.width

            +"' height='"+RSS.image.height

            +"' src='"+RSS.image.url

            +"' "+"/>"; /*C2*/

    }





    /*D*/

    document.getElementById("chan_items").innerHTML = "";

    for (var i=0; i<RSS.items.length; i++)

    {

        item_html = startItemTag;

        item_html += (RSS.items[i].title == null) ? "" : startTitle + RSS.items[i].title + endTag;

        item_html += (RSS.items[i].link == null) ? "" : startLink + RSS.items[i].link + endTag;

        item_html += (RSS.items[i].description == null) ? "" : startDescription + RSS.items[i].description + endTag;

        item_html += endTag;

        document.getElementById("chan_items").innerHTML += item_html; /*D1*/

    }



    return true;

}

A: As we have no way of knowing the number of channel items in the RSS feed, we must dynamically generate the HTML for the RSS items. These are the default values for the HTML tags that will contain the RSS2Item data. For compatibility, we also dynamically generate the img HTML tag.

B: We traverse the string properties in the RSS2Category object here, similar to how we did in the constructor. In order to clear any data that may remain from an old RSS feed, we reset the innerHTML property on line B1. We are able to fetch the specific div element that we need from the HTML by calling getElementById(). Providing that the property is defined, we set the div element to its new value on line B2.

C: Again, we use the getElementById() function to get the HTML element that will contain the image from the RSS feed. As the image should be linkable, we use an anchor element (a) instead of a div element. The href attribute in the anchor element specifies what the image should link to, so we assign it to the value found in RSS.image.link (C1). The content of the element is filled in using the innerHTML property, as we have done in part B (C2).

D: Here is where we display the items in the RSS object. A div tag is defined for each RSS item, containing the title, link, and description. For the sake of clarity, the other properties have been omitted. Each div tag is appended to the contents of the chan_items parent tag using the innerHTML property (D1).

Wrap-Up

The Ajax RSS parser has been tested in IE 6.0 and Firefox 1.5.0.6 for Windows XP. The RSS2Channel object does not support all of the elements in the RSS 2.0 specification. The ones that have been omitted are cloud, textInput, skipHours, and skipDays. For the most part, these RSS elements are only useful on the server side, so it wouldn't make sense to include them in a client-side parser.

After noting the length of the code, you may be thinking that the same functionality could have been accomplished with half the number of lines of code. In particular, we could have completely omitted the RSS object by writing the showRSS(RSS) function in a way that reads the RSS properties directly from the XML element. Certainly, this is possible. However, showRSS() is only meant to be an example of how the RSS2Channel object can be used. By defining an RSS object that contains meaningful RSS data, we have a much more scalable application. For example, the code can be easily extended to fetch multiple feeds. The RSS objects from these feeds can then be manipulated, or compared with other feeds (you can fetch a new feed after a certain interval, and compare it with the old one). The point of a separate RSS object is to make increasingly complex applications like this easier to develop.

All of the files that were discussed are available below:
The HTML file: rssajax.html
The JavaScript file, containing the RSS parser: rssajax.js
Sample RSS file 1: test-rss.xml
Sample RSS file 2: google-rss.xml