July 11, 2006

Posted by John

Tagged screen scraping

Older: 19 Rails Tricks

Newer: Back from Vacation and New Links Feed

Scrapi: Really Easy Screen Scraping

This isn’t Rails specific, but wow! Scraping just got a whole lot easier with Labnotes new library named Scrapi. An example from the post on how to use it…

ebay_auction = Scraper.define do
    process "h3.ens>a", :description=>:text,
                        :url=>"@href"
    process "td.ebcPr>span", :price=>:text
    process "div.ebPicture >a>img", :image=>"@src"

    result :description, :url, :price, :image
end

ebay = Scraper.define do
    array :auctions

    process "table.ebItemlist tr.single",
            :auctions => ebay_auction

    result :auctions
end

auctions = ebay.scrape(html)

Yeah. It’s that easy and that cool. The code is stable and currently being used in co.mments, a production app that does a lot of scraping (it keeps track of comments you leave at other’s sites).

You can grabe the code through svn right now and it will soon be available as a gem.

0 Comments

Thoughts? Do Tell...


textile enabled, preview above, please be nice
use <pre><code class="ruby"></code></pre> for code blocks

About

Authored by John Nunemaker (Noo-neh-maker), a web developer and programmer who has fallen deeply in love with Ruby. More about John.

Syndication

Feed IconRailsTips Articles - An assortment of howto's and thoughts on Ruby and Rails.

Feed IconRails Quick Tips - Ruby and Rails related links.