Secure RSS Syndication
July 13, 2005
I have a problem. It's actually a pretty common problem. I have data that I want to syndicate to myself, but I don't want you to see it. It's private. Now this could be my credit card balance or internal bug reports for the day job. Either way, I want the information in a form suitable for syndication but not available to everyone.
A Solution
There is a solution. I could password-protect my feed. But that causes a problem, because my aggregator would then need to know my password. Now my aggregator of choice is Bloglines, and I'm sure they're nice folks, but I really don't want to give them my password. One security breach and my whopping credit card debt is splattered across the Web. Just for the record, for the rest of the discussion I will use Bloglines, but in fact one of my design goals is that this technique should work equally well with all web-based aggregators.
One other possible solution comes from Atom. Atom provides support for XML Encryption Syntax & Processing. This isn't really usable, for several reasons:
- Atom isn't finished.
- Bloglines doesn't support XML Encryption.
- I want my encrypted data now!
A Different Solution
A better solution would not require me to give Bloglines my password, or some other key with which to decrypt my content. But if Bloglines isn't going to decrypt my content, who will?
How about my browser?
If I can somehow get my browser to decrypt the content of the feed, then I can continue to use Bloglines to poll the feed and present me with new items as they appear, but the decryption is done in my browser.
Enter the Greasemonkey
What we're talking about is giving Bloglines a quick upgrade and doing it ourselves. That means we're talking Greasemonkey, a Firefox extension that allows you to write scripts that modify the pages you visit. In this case, the modification is going to be decryption. We'll write a Greasemonkey script, securesyndication.user.js that looks for encrypted content and, using the private key we provide, will decrypt the content when we view it.
So here is the whole scenario.
- My content, which is going to sit inside the
description
element of an RSS feed, is going to be encrypted. We will actually put it inside a microformat. - That feed is syndicated.
- I will subscribe to that feed in Bloglines (or any other web-based aggregator).
- When I view items in that feed in Bloglines, the description is initially displayed encrypted, but my Greasemonkey script detects the encrypted content and decrypts it on the fly, and replaces the encrypted content with the decrypted content.
View Source
Here is an example of such an encrypted feed:
<?xml version="1.0" encoding="iso-8859-1"?> <rss version="2.0" xmlns:dc='http://purl.org/dc/elements/1.1/'> <channel> <title>BitWorking Blowfish Encrypted Test Feed</title> <link>http://bitworking.org/projects/blowfish/</link> <description>Secure Sydication With Blowfish and GreaseMonkey</description> <dc:creator>Joe Gregorio</dc:creator> <item> <title>A Test Entry</title> <link>http://bitworking.org/projects/blowfish</link> <description> <div class="encrypted blowfish"> <p>The following data is encrypted. Please install the SecureSyndication Greasemonkey script to view the encrypted content.</p> <div class="encdata">WORK:C7FDDC...4AC0643B86</div> </div> </description> <dc:date>2005-06-20T00:17:00-05:00</dc:date> </item> </channel> </rss>
Let's look at the microformat we're using to transport our encrypted content:
<div class="encrypted blowfish" > <p>The following data is encrypted. Please install the SecureSyndication Greasemonkey script to view the encrypted content. </p> <div class="encdata">WORK:C7FDD...15C4AC0643B86</div> </div>
The class value on the outer div of encrypted blowfish
states that the
contents of the div are encrypted, with "blowfish" being Bruce Schneier's symmetric
block
cipher Blowfish. The value inside the <div class="encdata"/>, besides the bit of
header information, is encrypted. What appears at the beginning of the text is a key
name,
then a colon, and then the blowfish-encrypted content. That's not the key itself,
just a
shorthand name for the key to use. This allows different keys to be used for different
feeds. All of the other elements in the div are ignored by our Greasemonkey script.
That
lets us put a nice paragraph in there explaining what is going on to those who are
unfamiliar with encrypted content. When our user script is run, it will decrypt the
data and
replace the innerHTML of the outer div with that decrypted content.
Why was this representation chosen? Everything for decryption must be passed faithfully from the RSS feed description element to the web page displayed by the web-based aggregator. That rules out creating a new element, using a namespaced element, or using a custom attribute to carry the information, as all of those will be stripped by an aggregator. What we are left with is the class and rel attributes of existing HTML elements. While rather severe, these restrictions are still powerful and may look a little familiar to you. They're the same restrictions that are used in defining microformats.
The encrypted data is created by converting the source text into UTF-16, encrypting it in 64-bit chunks via Blowfish, and then converting the binary data into a hexadecimal representation. UTF-16 was chosen because that is the native encoding that JavaScript uses, and we want to make things as easy on the client as possible.
This code assumes that you are linking to the C implementation of Blowfish done by Paul Kocher. The source for that and many other implementations of Blowfish are available from Bruce Schneier's site:
// p - Pointer to a 64-bit block of memory // output - Pointer to a string buffer that will be // concatenated with the output of the encryption. // The output will be formatted in hex. void encrypt(BLOWFISH_CTX * ctx, char a, char b, char c, char d, char * output) { unsigned long L, R; char buf[20]; L = a + (b << 16); R = c + (d << 16); Blowfish_Encrypt(ctx, &L, &R); sprintf (buf, "%08lX%08lX", L, R); strcat(output, buf); } void main(void) { int i; BLOWFISH_CTX ctx; Blowfish_Init (&ctx, (unsigned char*)"TESTKEY", 7); char message[2048]; strcpy(message, "This is a blowfish encrypted message."); int length = strlen(message); if (length & 0x01) { length++; } unsigned char * p = message; char output[8192]; output[0] = 0; for (i=0; i<length; i+=4) { encrypt(&ctx, *p, *(p+1), *(p+2), *(p+3), output); p += 4; } printf("\nEncrypted: %s\n", output); }
This will produce the following output, except that it will be one continuous line:
Encrypted: C7FDDC3B50FF0BE0E6F47CBD54\ AC149F3886B2D3D45FE7C812A55E3C660F0F6\ 4ED7454037741FDAEE5CBA49B89A2480CCD6E\ 252E87E6A134D3249ECA4AA465B39420CC85F\ 99E95FF49040685D7DA9804
A Low-Friction Hammer
I chose Blowfish because it is a low-friction solution to my encryption needs. It is fast, unpatented, and license-free, and is available free for all uses. My only problem was that a JavaScript implementation didn't seem to be available. So the first task was to port the algorithm to JavaScript.
Some Integer Friction
The port was easy, except for when it wasn't. The problem is that the Blowfish algorithm
does all of its work on 32-bit integers. Now JavaScript has integer values, but they
aren't
limited to 32 bits--that is, unless, you take an integer value and try to shift it,
AND
it, or OR
it with another value. Then it suddenly transforms
into a 32-bit value. That behavior is actually rather inconvenient and required some
workarounds. Besides that, the code looks very similar to the original C source. The
data
tables and the functions F()
, blowfish_init()
,
blowfish_encrypt()
, and blowfish_decrypt()
are almost identical
to the corresponding C functions. The problems with JavaScript integer math required
writing
three utility functions, uns32Add()
, split32()
, and
uns32Xor()
. The last bit of difference is that in the C code, the context for
the encryption and decryption routines is held in a struct. That necessarily migrates
into a
class in JavaScript.
Line by Line
Here is the main code for the Greasemonkey script:
keys = {"WORK": "TESTKEY"}; var enc_divs, tdiv; enc_divs = document.evaluate( "//div[contains('encrypted blowfish', @class)]//div[contains ('encdata', @class)]", document, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null); for (var i = 0; i < enc_divs.snapshotLength; i++) { tdiv = enc_divs.snapshotItem(i); div_content = tdiv.innerHTML.split(":"); if (div_content.length == 2 && div_content[0] in keys) { var context = new BlowfishCtx(); blowfish_init(context, keys[div_content[0]]); unres = decrypt_string(context, div_content[1]); tdiv.parentNode.innerHTML = unres; } }
The keys
array maps the shorthand name for a key to the actual private
key.
The rest of the code loops over all the divs with class attributes that contain
encrypted blowfish
. For each successful match, we break the content at the
colon to get the key name and the encrypted data. At this point we create a new Blowfish
context, initialize it with the indicated private key, and decrypt the contents. After
decryption, the content of the div is replaced with the newly decrypted content.
Here is the rest of the JavaScript, at least the parts that aren't just a port of the Blowfish algorithm:
function decrypt(context, s) { var L = 0, R = 0; L = parseInt(s.substring(0, 8), 16); R = parseInt(s.substring(8, 16), 16); r = blowfish_decrypt(context, L, R); var s = ""; if (r[0] & 0xFFFF) { s += String.fromCharCode(r[0] & 0xFFFF); } if (r[0] >> 16) { s += String.fromCharCode(r[0] >> 16); } if (r[1] & 0xFFFF) { s += String.fromCharCode(r[1] & 0xFFFF); } if (r[1] >> 16) { s += String.fromCharCode(r[1] >> 16); } return s; } function decrypt_string(context, s) { output = ""; while (s.length) { output += decrypt(context, s); s = s.substring(16, s.length); } return output; }
Note that we just take the string, decode the hexadecimal encoding, and then decrypt it in 64-bit chunks. Each 64-bit chunk is then treated as four UTF-16 character codes. Remember that we encoded using UTF-16, which makes the client-side code easier since JavaScript uses UTF-16 natively. In the case of the above C code, we just assume the data is in ASCII and is natively converted into UTF-16.
In Action
The test feed I showed earlier is available at:
http://bitworking.org/projects/securesyndication/index.rss
If we subscribe to it in Bloglines, it will show up as:
Now if we install securesyndication.user.js, it will now appear, after a short delay, as:
The "short delay" is relative. Decrypting takes a bit of work, so your mileage may vary.
Summary
This is a solution that doesn't require boiling the ocean. What are the benefits?
- We don't have to wait for Atom to be finished.
- We don't have to wait for our aggregator of choice to implement XML Encryption.
- We never have to hand over a password, or a key, to our aggregator.
- It works today.
There is obviously a lot of work to be done. My implementation of Blowfish in JavaScript is probably not optimal, to put it mildly. In addition, the Greasemonkey script in general could use some polish. One improvement would be to prompt the user for a key when a new one is encountered, instead of hand-editing the script for every key you use. On the generating side, a Python library for Blowfish would be handy too.
Oh yeah, did I mention that I'm not a cryptologist? Consider the publication of this article as the beginning, not the end, of the discussion on how to do this correctly.
One observation is worthwhile at this point. Nothing about this Greasemonkey script is particular to syndication. In fact, you could use this technique to publish and read encrypted content on the Web regardless of whether it was included in a syndication feed.
All of the source code found in this article, and more, can be found at http://bitworking.org/projects/securesyndication/. Thanks to the creators of Greasemonkey, Mark Pilgrim for Dive Into Greasemonkey, Mark Fletcher and his crew for Bloglines, and, finally, Bruce Schneier for the Blowfish Encryption Algorithm.