Hacking eBay: Turning Email Alerts into Atom
November 23, 2005
From our geeky perspective, Atom and RSS seem to be sweeping through the internet, changing the way people and systems get notified about events. From a broader perspective, though, they've got a long way to go — we all have plenty of computer-literate friends who've never heard of either.
This means that plenty of opportunities remain to improve systems and applications using RSS or Atom. (Because Atom is the latest and greatest in the history of RSS formats, with endorsements from key representatives of the earlier formats, I'm going to focus on using Atom, but the basic ideas here would work for any flavor of RSS.) I see two basic categories of such opportunities: as demos to show how a given system can benefit from an Atom delivery option, and as personal utilities to make your own life easier. A given application can fall into both categories; I wrote something to convert eBay saved search notifications into an Atom feed to make my own life easier, but if I were an eBay employee I'd be showing it to my boss and saying "Hey! This is easy to implement and a great new way to spread the use of our product!"
An Atom version of an email notification system makes a great demo of Atom's power for several reasons. First, a popular incentive for Atom use is that it reduces our email in a controlled fashion, reducing not only the overall email load but also the chance of spam-checker false positives. Second, the difficult part has probably already been done for you — many online products and systems already have elaborate infrastructures in place to track who wants email notification of what. These systems may let recipients specify what they want to be notified about, the email address to send the notification, the frequency of the emails, the format of the emails, and other details.
For example, when you perform a search such as "Elvis black velvet" on eBay, the results screen includes an Add to Favorite Searches link that lets you name the search and add it to a list accessible from your My eBay screen. In addition to naming the search, you can tell the eBay system to send you an email when relevant new items appear.
Figure 1. Black velvet Elvis portrait
Elvis Black VelvetMy wife and I have a room in the basement with a lot of Elvis stuff. The room has its own bathroom with more Elvis stuff, so we refer to the combination as the "Presley Suite." We're not fanatical Elvis fans — I don't think we have more than a dozen CDs and vinyl Presley albums — but he was responsible for a lot of great rock-and-roll and a lot of silly kitsch that's nearly as entertaining. Our Presley Suite lacked any painted black velvet Elvis pictures, a serious omission in a collection of Elvis kitsch, so I created an eBay saved search on the phrase "Elvis black velvet." Like a lot of eBay searches, you find some great deals that aren't as great when you consider shipping costs, and you see the same people selling the same things over and over. I eventually bought one of the recurring ones, shown above: a somewhat Native American-looking Elvis and an even less Native American-looking woman. (Elvis fans will recognize it as an allusion to Flaming Star, a 1960 western in which Presley played the son of a Native American woman and a white man.) Occasionally some real one-of-a-kind things showed up in the search results, and the serious collectors quickly drove the price out of my reach. My favorite was an old black velvet Elvis paint-by-numbers kit that hadn't been used yet — imagine white lines dividing up his black velvet face into tiny numbered sections. That would have been a highlight of our collection, right up there with D.J. Fontana's autograph and the Thai bamboo curtain of Warhol's gunslinger Elvis portrait that we found on London's Portobello Road. |
Several libraries and services are available to convert email to RSS or Atom. Evaluating each of these, picking one, installing it, and configuring it on my host provider's system seemed like more trouble than just creating something myself. Doing it myself also lets me customize the input and output as much as I like. For example, the Mailbucket service, which lets you create an email address on their server and then creates an RSS feed of mail sent to that address, won't work for an eBay favorite search because eBay sends the notification emails to the email address they have on file for you, not to any address that you want specify when you create the search. Services like Mailbucket also convert each email into a single RSS or Atom entry, but I wanted to see a single eBay email about six hits for the search on "Elvis black velvet" converted to six Atom entries.
To convert the formats, I knew that tools like Perl and XSLT make it easy to convert between plain text, HTML, and XML, but I needed a way to have the arrival of the mail trigger the conversion. This turned out to be relatively easy once I found out about procmail.
procmail
procmail is a venerable Unix utility for automating the processing of mail. (I'm not running my own mail server, but my host provider lets me create procmail configurations.) It first became popular for automated sorting of mail into different folders, but now most popular mail clients include built-in features to do that. procmail then became popular for spam detection, but mail client programs usually do that now as well. As a general-purpose tool for redirecting emails meeting certain criteria to specific locations or, better yet, for routing them to be processed by particular programs, procmail is still a great tool for adapting email workflows to newer technologies such as Atom.
Before procmail can do anything with your mail, you have to route your mail through
it.
This may mean creating a .forward
file; in my case, I had to fill out a mail
account configuration screen on my host provider's mail administration page. Mail
then gets
piped through the procmail program, which checks a .procmailrc
file to see if
anything special should be done to the email. If not, the email is passed along to
your
inbox untouched.
For the syntax of this .procmailrc
file, a tutorial at the Ohio State
University math department provides a good start, and the procmail quickstart at Infinite
Ink is far more detailed than its name implies.
A typical .procmailrc
file begins with the setting of some environment
variables and then has a series of rules that each specify which mail they apply to
and what
to do to those emails. Most rules start with :0:
on its own line. The following
rule tells procmail to route mail that has the string sales
in its subject line
to the sales folder:
:0: * ^Subject:.*sales sales
A regular expression on the rule's second line describes the mail to look for, usually by specifying a line in the mail header to check, and the third line indicates where to send the email message. Beginning the third line with a pipe symbol tells procmail to send the email message to be used as input to a program, like in this procmail rule:
:0: * ^From:.*savedsearches@ebay.com | /usr/www/users/bobd/rss/bin/ebaymail2atom.sh
This message tells procmail that when the "From" line includes the string
savedsearches@ebay.com
, the contents of that message should be piped to the
ebaymail2atom.sh
script in the specified directory.
Converting Email to Atom
How should you convert your email to Atom? It depends on the format of the mail, the
tools
provided on the system you're using, and which you're most comfortable with. eBay
emails
include both text and HTML versions of the message, so I wrote a Perl script to strip
the
text version and add a bit of metadata, and then I used the libxslt xsltproc command-line
XSLT processor to convert the Perl script's output to Atom. The xsltproc -html
parameter lets you process ill-formed HTML as if it were well-formed XML so that you
can
apply an XSLT stylesheet to it.
The following shows my ebaymail2atom.sh
shell script. A backslash ending a
shell script line is a continuation character, so treat the line that follows it as
more of
the same line. I'll discuss the bolded parts first and then come back to the beginning
and
ending parts.
#! /bin/csh set rssdir=/usr/www/users/bobd/rss # setup for update of newAtomFeeds.atom: backup old one # and check dir contents before creating ebay atom file. cp $rssdir/newAtomFeeds.atom $rssdir/newAtomFeeds.atom.bkp ls -1tr .. > /tmp/temp1.txt # convert input to html $rssdir/bin/ebaymail2html.pl > /tmp/ebaytemp.html # convert html to atom xsltproc -html --stringparam RSSFilePath "$rssdir" \ $rssdir/bin/ebayhtml2atom.xsl /tmp/ebaytemp.html rm /tmp/ebaytemp.html # In case a new file (whose name we won't know) got created chmod 644 /usr/www/users/bobd/rss/*.atom # finish update of newAtomFeeds.atom ls -1tr .. > /tmp/temp2.txt diff /tmp/temp1.txt /tmp/temp2.txt | \ $rssdir/bin/newentryatom.pl > $rssdir/newAtomFeeds.atom
(Links to the complete XSLT style sheets and Perl scripts are provided at the end
of this
article.) In addition to extracting the HTML from the email, the Perl script adds
the
following metadata to the email in HTML meta
elements (when I tried adding it
in non-HTML elements, libxslt complained, because I had told it with the -html
switch to expect HTML) :
-
An ID value for each item based on the eBay ID assigned to it. While Atom doen't require much, it does require an ID for each entry. The Perl script pulls this from the URLs that link to the eBay items.
-
The search query string used to generate the result set. This is also pulled from a URL in the email HTML, and the XSLT stylesheet uses it to create the subtitle of the Atom feed.
-
The filename of the output Atom file, created from the name assigned to the search when it was added to the Favorite Searches list.
Because the output filename is stored within the data itself, there's no simple way
to know
at shell script execution time what that name will be. So, although a command-line
execution
of an XSLT processor (or of most other text processing programs) typically names the
output
file along with other execution parameters, the ebayhtml2atom.xsl
program
called by ebaymail2atom.sh
uses a different technique: the XSLT 1.0
document
extension element defined as part of the EXSLT project. (XSLT 2.0 has a
comparable instruction built in as part of the base spec.) The style sheet reads the
filename that the Perl script stored in the meta
element with a
name
attribute value of filename
and uses that filename to
assemble the output path used by the exsl:document
element that builds the
output file.
The style sheet has one more bit of tricky I/O to implement. Let's say you want a
minimum
of the eight most recent "Elvis black velvet" entries in your Atom feed, and two new
ones
arrive. You'll want to output those two and then the six most recent ones from the
existing
file, so the stylesheet uses the XSLT document()
function
to read those from the disk file.
Notification of the Notification
Once you have this all set up, the true test of whether it works is whether procmail
and
all the shell script pieces do their jobs when eBay sends an alert to your mailbox.
If you
point your RSS/Atom reader client at a file that doesn't exist yet, it will give you
an
error, so I was grumbling to myself about the inconvenience of waiting for that first
eBay
email about a given search to trigger the creation of the Atom file before I could
add the
feed to a reader and make sure that it worked. I thought it would be much easier if
some
automated system would notify me when a new feed was ready, so I wrote a Perl script
to
create a newAtomFeeds.atom
feed!
Near the beginning of my ebaymail2atom.sh
shell script, an ls
command saves a one-column list of the files in the directory where I store Atom and
RSS
files into a file in the /tmp
directory. After the bolded part of the shell
script shown above creates an Atom version of the email, a similar ls
command
creates a second list, and the Unix diff
command then compares the two lists,
sending the result to a newentryatom.pl
Perl script. This script creates a new
entry in the newAtomFeeds.atom
file if a new file showed up in that directory
in between the creation of the /tmp/temp1.txt
and /tmp/temp2.txt
files.
As with the eBay item Atom file created by the ebayhtml2atom.xsl
style sheet,
we want the newentryatom.pl
Perl script to pad the Atom file with the most
recent existing entries if there aren't many new entries. To do this before, we saw
that the
xsltproc XSLT processor can use the XPath document()
function to read from an
existing disk file that has the same name as the output file that it's going to write
to. A
Perl script whose output is being redirected to a given file, however, can't read
from a
version of that file that was sitting on the disk just before the Perl script was
run. So,
the ebaymail2atom.sh
shell script creates a copy of
newAtomFeeds.atom
called newAtomFeeds.atom.bkp
, and that's what
newentryatom.pl
reads for entries to pad the file it creates. With my
RSS/Atom client pointing at the newAtomFeeds.atom
file, I'll always know when a
new eBay feed is ready to test.
Testing and Setting Up Your Script
Before you run a shell script like this, make sure that all the pieces work properly. Find out where the mail files are stored on your system, store a few messages as individual files, and run the Perl script and the XSLT style sheet on them to make sure that your script and style sheet do what you want.
The same email files will be useful to do your integration testing of the shell script that ties everything together. Send their contents to the shell script with a command line like this:
./ebaymail2atom.sh < ~/temp/ebaytest01.mail
Use an RSS/Atom reader to look at the file that gets created and make sure that it looks OK. For additional reassurance,validate it against an Atom 1.0 schema.
Watch out for another trap that lies in wait for shell script dabblers like myself: remember that even if your scripts run properly when you send files to them as shown above, you don't know what the current directory will be when procmail runs the scripts, so you need to qualify the names of all files and scripts in your shell script with full pathnames. As you can see, I put the pathname in a variable and used that throughout.
More Atom, More Convenience
eBay will eventually understand the value of offering an Atom delivery option in addition to an email delivery option, and you won't need to use procmail and scripting to redirect emails about new Elvis black velvet items to your RSS/Atom reader. Scripting on top of the mail infrastructure is not the leanest solution to use until then, but as a proof of concept that you can actually use (as opposed to simply being a demo) it points the way for people who don't yet understand the value of this relatively new form of communication. The Unix philosophy of connecting the input and output of different tools to create new applications (a philosophy that predates the "mashup" Web 2.0 hype by several generations) makes it easy to incorporate Atom technology into existing infrastructures, so you should be able to add Atom notification features to more than just email applications. Check which scripting tools your server or host provider offers, think about which of the server's applications would benefit from Atom notification, and build it in yourself — you might surprise some people.