In 1492, Columbus Discovered...A Feed

Sorry for the rush of posts the past few days. I’ve been feeling a bit inspired of late, and with the discovery of Jeweler, it is now easier to make a gem than it is to just let the code sit in ~/dev/ruby/ on my computer.

A few weeks ago, I showed how to follow redirects using net/http, but I provided no context really as to why I wanted to follow redirects. Basically, that code was one piece of some code I wrote to auto-discover feed urls for a given url.

I was a little more creative with the name this time than last time, calling it Columbus. Get it? Auto-discovery and Columbus was a discoverer. Yeah, you get it. I’m sure I don’t need to explain it. Right? Yeah.

Usage

There isn’t much code in the gem and using it is even easier.

# get the primary feed
primary = Columbus.new('http://railstips.org').primary
puts primary.url, primary.title, primary.body

# get all the feeds
Columbus.new('http://railstips.org').all

The first returns a single feed if one is found else nil. The second returns all the feeds found in an array. That probably doesn’t feel like much, but there is a lot more going on behind the scenes.

Behind the Scenes

Gets the response for the passed in url.
If the URL is a redirect, it follows the redirect up to 5 times to find the endpoint.
Once it has the endpoint, it uses Hpricot to get all the link tags in the response body that appear to be RSS or Atom feeds.
For each link tag found, it gets the response for the URL and once again follows redirects up to 5 times until it finds an endpoint.
Once the endpoint for each feed is found, it returns the URL, the title and the response body for you to fart around with.

Some Details

Once again, I used shoulda, matchy and fakeweb to do the testing. I didn’t need HTTParty, but I did break out an old friend Hpricot, which I haven’t used since XML parsing in the Twitter gem. Kind of funny that this is the first time I used Hpricot for its original intent, parsing HTML.

Installation

For now the gem is just up on Github so the usual routine will get your going.

sudo gem install jnunemaker-columbus --source http://gems.github.com

Hopefully someone finds it useful someday. :) I’ve already got my mileage out of it.

6 Comments

grosser
Mar 27, 2009

ive built(or lets say hacked) something similar, hopefully it can be replaced with columbus :)
Nicolas
Mar 27, 2009

I think this exactly what I’ll need in 1 week now. Thanks you very much.
Patrick Reagan
Mar 27, 2009

Very cool. I needed something like this when we were building a simple RSS aggregating app. We ended up using the FeedBag gem to do the auto-discovery and then using FeedNormalizer to grab content from both RSS and Atom feeds. I don’t think it handles the redirection though – we had to do that on our own.

I could see using this & FeedZirra together to achieve the same functionality.
John Nunemaker
Mar 27, 2009

@Patrick – Yep. I’m using this with FeedZirra right now. I’ve ran into a few sites where FeedZirra doesn’t work, but I’m sure it is an easy fix that I just haven’t had time to look into.
Andy
Mar 27, 2009

Hi John. Could this be rolled into HTTParty? My first reaction was that HTTParty already does most of this, except for parsing the feed URLs. It might be nice separate if someone only wanted the feed URL to fetch the contents, but for those using HTTParty already it seems it would save a little Hpricot parsing.
John Nunemaker
Mar 27, 2009

@Andy – The only integration I foresee between the two is pulling the redirect follower out of Columbus and into a separate gem, and then pulling the redirect following code out of HTTParty and instead relying on the new redirect follower gem.

Also, I’m not sure I would want to parse the HTML using an XML parser. HTML is usually less strictly employed than XML so I’m not sure how well it would work. It might though. Could be interesting.