Friday, August 8, 2008

Google App Engine powered RSS feeds

I am a huge Schlock Mercenary fun, and I've been looking for a simple project to teach me a bit about Google App Engine (GAE). I originally thought that this might be a trivial use of GAE, but if solves a problem I have had for years. Schlock releases a new strip every night at 8 PM PDT, but offers no RSS feed, partially due to the extreme regularity of the update schedule. I've been wanting to add an RSS feed for his strip, and after talking with the author, Howard Tayler, over the past week, he has been surprisingly receptive. He surprised me with a request to include the images within the feed itself. A mirror of project can be found here, though the real thing is sent through feedburner.
  • New strips are published at 8PM
  • Today's strip is a link to the front page before 5PM PDT.
  • Today's strip links to the archive after 5PM PDT.
  • All other links go to the archive.
  • The Atom feed does not contain images, but the RSS feed does.
  • The comic consists of 3 JPGs on Sunday.
  • The comic consists of a single PNG or JPG on other days, depending on the amount of shading.
  • Project Wonderful ad integration -- I wanted to add this, but my original approach turned out not to be feasible. The feed would require a new agreement with Project Wonderful, and simply piggybacking off of the existing ad arrangement of the main site would be a violation.
  • Adwords is provided for free once you use feedburner.
This turned out to be a great learning experience for GAE, Atom, and RSS. I was disappointed to learn that Atom has some serious limitations when it comes to including images within a feed itself. What made this truly enjoyable was interacting with Howard. He was very receptive to the feed, and he was able to clarify several things about the site as well as make a few requests that turned this from a trivial project to a true learning experience. I originally created the feed as an Atom feed, as it is more of an open standard, while the RSS format appeared to have some odd ownership quirks. This worked fine at first, until Howard asked me to put images into the feed itself. After much wrestling with the Atom format, I came to the conclusion that putting arbitrary images into an Atom feed was not going to work. Since RSS and Atom are both easy to implement, the solution was to learn more about RSS and implement an RSS image feed. The second quirk came about when I realized that I would have to scrape data off of the web site itself in order to generate the image links. Howard publishes most dailies as PNG, but when he puts extra effort into shading, he likes to publish as JPG. Thankfully, this is as simply as checking the PNG and seeing if I get a 404, but caching the results gave me a good reason to learn memcache.

No comments: