Secure RSS Syndication

July 13, 2005

I have a problem. It's actually a pretty common problem. I have data that I want to syndicate to myself, but I don't want you to see it. It's private. Now this could be my credit card balance or internal bug reports for the day job. Either way, I want the information in a form suitable for syndication but not available to everyone.

A Solution

There is a solution. I could password-protect my feed. But that causes a problem, because my aggregator would then need to know my password. Now my aggregator of choice is Bloglines, and I'm sure they're nice folks, but I really don't want to give them my password. One security breach and my whopping credit card debt is splattered across the Web. Just for the record, for the rest of the discussion I will use Bloglines, but in fact one of my design goals is that this technique should work equally well with all web-based aggregators.

One other possible solution comes from Atom. Atom provides support for XML Encryption Syntax & Processing. This isn't really usable, for several reasons:

Atom isn't finished.
Bloglines doesn't support XML Encryption.
I want my encrypted data now!

A Different Solution

A better solution would not require me to give Bloglines my password, or some other key with which to decrypt my content. But if Bloglines isn't going to decrypt my content, who will?

How about my browser?

If I can somehow get my browser to decrypt the content of the feed, then I can continue to use Bloglines to poll the feed and present me with new items as they appear, but the decryption is done in my browser.

Enter the Greasemonkey

What we're talking about is giving Bloglines a quick upgrade and doing it ourselves. That means we're talking Greasemonkey, a Firefox extension that allows you to write scripts that modify the pages you visit. In this case, the modification is going to be decryption. We'll write a Greasemonkey script, securesyndication.user.js that looks for encrypted content and, using the private key we provide, will decrypt the content when we view it.

So here is the whole scenario.

My content, which is going to sit inside the description element of an RSS feed, is going to be encrypted. We will actually put it inside a microformat.
That feed is syndicated.
I will subscribe to that feed in Bloglines (or any other web-based aggregator).
When I view items in that feed in Bloglines, the description is initially displayed encrypted, but my Greasemonkey script detects the encrypted content and decrypts it on the fly, and replaces the encrypted content with the decrypted content.

View Source

Here is an example of such an encrypted feed:


<?xml version="1.0" encoding="iso-8859-1"?>

<rss version="2.0" xmlns:dc='http://purl.org/dc/elements/1.1/'>

  <channel>

    <title>BitWorking Blowfish Encrypted Test Feed</title>

    <link>http://bitworking.org/projects/blowfish/</link>

    <description>Secure Sydication With 

        Blowfish and GreaseMonkey</description>

    <dc:creator>Joe Gregorio</dc:creator>

   <item>

      <title>A Test Entry</title>

      <link>http://bitworking.org/projects/blowfish</link>

      <description>



 &lt;div class="encrypted blowfish">

     &lt;p>The following data is encrypted. Please install

        the SecureSyndication Greasemonkey script to 

        view the encrypted content.&lt;/p>

     &lt;div class="encdata">WORK:C7FDDC...4AC0643B86&lt;/div>

 &lt;/div>



      </description>



      <dc:date>2005-06-20T00:17:00-05:00</dc:date>      

   </item>

  </channel>

</rss>

Let's look at the microformat we're using to transport our encrypted content:


  <div class="encrypted blowfish" >

     <p>The following data is encrypted. Please install

        the SecureSyndication Greasemonkey script to 

        view the encrypted content.

     </p>

     <div class="encdata">WORK:C7FDD...15C4AC0643B86</div>

  </div>

The class value on the outer div of encrypted blowfish states that the contents of the div are encrypted, with "blowfish" being Bruce Schneier's symmetric block cipher Blowfish. The value inside the <div class="encdata"/>, besides the bit of header information, is encrypted. What appears at the beginning of the text is a key name, then a colon, and then the blowfish-encrypted content. That's not the key itself, just a shorthand name for the key to use. This allows different keys to be used for different feeds. All of the other elements in the div are ignored by our Greasemonkey script. That lets us put a nice paragraph in there explaining what is going on to those who are unfamiliar with encrypted content. When our user script is run, it will decrypt the data and replace the innerHTML of the outer div with that decrypted content.

Why was this representation chosen? Everything for decryption must be passed faithfully from the RSS feed description element to the web page displayed by the web-based aggregator. That rules out creating a new element, using a namespaced element, or using a custom attribute to carry the information, as all of those will be stripped by an aggregator. What we are left with is the class and rel attributes of existing HTML elements. While rather severe, these restrictions are still powerful and may look a little familiar to you. They're the same restrictions that are used in defining microformats.

The encrypted data is created by converting the source text into UTF-16, encrypting it in 64-bit chunks via Blowfish, and then converting the binary data into a hexadecimal representation. UTF-16 was chosen because that is the native encoding that JavaScript uses, and we want to make things as easy on the client as possible.

This code assumes that you are linking to the C implementation of Blowfish done by Paul Kocher. The source for that and many other implementations of Blowfish are available from Bruce Schneier's site:


// p - Pointer to a 64-bit block of memory

// output - Pointer to a string buffer that will be 

// concatenated with the output of the encryption. 

// The output will be formatted in hex.

void encrypt(BLOWFISH_CTX * ctx, char a, char b, 

   char c, char d, char * output) {

    unsigned long L, R;

    char buf[20];

    L = a + (b << 16);

    R = c + (d << 16);

    Blowfish_Encrypt(ctx, &L, &R);

    sprintf (buf, "%08lX%08lX", L, R);

    strcat(output, buf);

}



void main(void) {

    int i;

    BLOWFISH_CTX ctx;



    Blowfish_Init (&ctx, (unsigned char*)"TESTKEY", 7);



    char message[2048];

    strcpy(message, "This is a blowfish encrypted message.");

    int length = strlen(message);

    if (length & 0x01) {

        length++;

    }



    unsigned char * p = message;

    char output[8192];

    output[0] = 0;

    for (i=0; i<length; i+=4) {

        encrypt(&ctx, *p, *(p+1), *(p+2), *(p+3), output);

        p += 4; 

    }

    printf("\nEncrypted: %s\n", output);

}

This will produce the following output, except that it will be one continuous line:


  Encrypted: C7FDDC3B50FF0BE0E6F47CBD54\

  AC149F3886B2D3D45FE7C812A55E3C660F0F6\

  4ED7454037741FDAEE5CBA49B89A2480CCD6E\

  252E87E6A134D3249ECA4AA465B39420CC85F\

  99E95FF49040685D7DA9804

A Low-Friction Hammer

I chose Blowfish because it is a low-friction solution to my encryption needs. It is fast, unpatented, and license-free, and is available free for all uses. My only problem was that a JavaScript implementation didn't seem to be available. So the first task was to port the algorithm to JavaScript.

Some Integer Friction

The port was easy, except for when it wasn't. The problem is that the Blowfish algorithm does all of its work on 32-bit integers. Now JavaScript has integer values, but they aren't limited to 32 bits--that is, unless, you take an integer value and try to shift it, AND it, or OR it with another value. Then it suddenly transforms into a 32-bit value. That behavior is actually rather inconvenient and required some workarounds. Besides that, the code looks very similar to the original C source. The data tables and the functions F(), blowfish_init(), blowfish_encrypt(), and blowfish_decrypt() are almost identical to the corresponding C functions. The problems with JavaScript integer math required writing three utility functions, uns32Add(), split32(), and uns32Xor(). The last bit of difference is that in the C code, the context for the encryption and decryption routines is held in a struct. That necessarily migrates into a class in JavaScript.

Line by Line

Here is the main code for the Greasemonkey script:

keys = {"WORK": "TESTKEY"};



var enc_divs, tdiv;

enc_divs = document.evaluate(

    "//div[contains('encrypted blowfish', @class)]//div[contains

	('encdata', @class)]",

    document,

    null,

    XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,

    null);

for (var i = 0; i < enc_divs.snapshotLength; i++) {

    tdiv = enc_divs.snapshotItem(i);

    div_content = tdiv.innerHTML.split(":");

    if (div_content.length == 2 && div_content[0] in keys) {

        var context = new BlowfishCtx();

        blowfish_init(context, keys[div_content[0]]);

        unres = decrypt_string(context, div_content[1]);

        tdiv.parentNode.innerHTML = unres;

    }

}

The keys array maps the shorthand name for a key to the actual private key.

The rest of the code loops over all the divs with class attributes that contain encrypted blowfish. For each successful match, we break the content at the colon to get the key name and the encrypted data. At this point we create a new Blowfish context, initialize it with the indicated private key, and decrypt the contents. After decryption, the content of the div is replaced with the newly decrypted content.

Here is the rest of the JavaScript, at least the parts that aren't just a port of the Blowfish algorithm:

function decrypt(context, s) {

    var L = 0, R = 0;

    L = parseInt(s.substring(0, 8), 16);

    R = parseInt(s.substring(8, 16), 16);

    r = blowfish_decrypt(context, L, R);

    var s = ""; 

    if (r[0] & 0xFFFF) {

        s += String.fromCharCode(r[0] & 0xFFFF);

    }

    if (r[0] >> 16) {

        s += String.fromCharCode(r[0] >> 16);

    }

    if (r[1] & 0xFFFF) {

        s += String.fromCharCode(r[1] & 0xFFFF);

    }

    if (r[1] >> 16) {

        s += String.fromCharCode(r[1] >> 16);

    }

    return s;

}



function decrypt_string(context, s) {

    output = "";

    while (s.length) {

        output += decrypt(context, s);

        s = s.substring(16, s.length);

    }

    return output;

}

Note that we just take the string, decode the hexadecimal encoding, and then decrypt it in 64-bit chunks. Each 64-bit chunk is then treated as four UTF-16 character codes. Remember that we encoded using UTF-16, which makes the client-side code easier since JavaScript uses UTF-16 natively. In the case of the above C code, we just assume the data is in ASCII and is natively converted into UTF-16.

In Action

The test feed I showed earlier is available at:

http://bitworking.org/projects/securesyndication/index.rss

If we subscribe to it in Bloglines, it will show up as:

Before

Now if we install securesyndication.user.js, it will now appear, after a short delay, as:

After

The "short delay" is relative. Decrypting takes a bit of work, so your mileage may vary.

Summary

This is a solution that doesn't require boiling the ocean. What are the benefits?

We don't have to wait for Atom to be finished.
We don't have to wait for our aggregator of choice to implement XML Encryption.
We never have to hand over a password, or a key, to our aggregator.
It works today.

There is obviously a lot of work to be done. My implementation of Blowfish in JavaScript is probably not optimal, to put it mildly. In addition, the Greasemonkey script in general could use some polish. One improvement would be to prompt the user for a key when a new one is encountered, instead of hand-editing the script for every key you use. On the generating side, a Python library for Blowfish would be handy too.

Oh yeah, did I mention that I'm not a cryptologist? Consider the publication of this article as the beginning, not the end, of the discussion on how to do this correctly.

One observation is worthwhile at this point. Nothing about this Greasemonkey script is particular to syndication. In fact, you could use this technique to publish and read encrypted content on the Web regardless of whether it was included in a syndication feed.

All of the source code found in this article, and more, can be found at http://bitworking.org/projects/securesyndication/. Thanks to the creators of Greasemonkey, Mark Pilgrim for Dive Into Greasemonkey, Mark Fletcher and his crew for Bloglines, and, finally, Bruce Schneier for the Blowfish Encryption Algorithm.