December 20, 2010

Posted by John

Tagged gems and mongomapper

Older: The Chain Gang

Newer: Improving Your Methods

Hunt, An Experiment in Search

I already mentioned this on Twitter, but for those of you that do not follow me there I thought I would take some time out of my day for a quick write up.

Quite often for little pet projects, I want search. While I am whole-heartedly behind Sunspot and Solr for anything serious, most of the time I will gladly sacrifice accuracy for simplicity.

This weekend I spent a couple hours whipping together Hunt, a really basic MongoMapper plugin that makes it easy to add basic search to your application.

Install/Usage

Install is obvious, but I will put it here so you can copy and paste.

gem install hunt

Once installed, you just have to declare the plugin in your model, and then tell it what you want to search on.

class Note
  include MongoMapper::Document

  key :title, String

  # Declare plugin and what to search
  plugin Hunt
  searches :title
end

With just those two lines, Hunt will start tracking terms and allow you to search through them. The handy thing is that the search method just returns a scope, allowing you to further filter it or do counting and pagination.

Note.create(:title => 'A different test')
Note.search('test').count    # 1
Note.search('testing').count # 1

Note.search('testing').all
# [#<Note searches: {"default"=>["differ", "test"]}, title: "A different test", _id: BSON::ObjectId('...')>]

Note.search('test').paginate(:page => 1)
# [#<Note searches: {"default"=>["differ", "test"]}, title: "A different test", _id: BSON::ObjectId('...')>]

Behind the Scenes

So how does Hunt work? Before save, it gets the value of each key we want to search on, mashes them together into a big string and then does the following:

  • squeezes multiple spaces together
  • splits string on space into array
  • downcases all words
  • rejects all words less than 2 in size
  • rejects all words in an ignore list (do, don’t, you, yours, etc.)
  • strips punctation from each word
  • rejects any blank words
  • stems each word
  • uniqs the remaining stems

Stemming

Probably most of that seems logical, but stemming may be unfamiliar to you. I was unfamiliar too before doing a bit of research. Stemming is the process of reducing inflected words to their root (ie: testing, tested, and tests to test).

I think of stemming as word normalization. As long as you normalize all the words you want to search on and normalize the search terms each time you query, your searches can be more intelligent than just exact matches.

I found several gems for stemming and after some cursory examination and benchmarking went with fast-stemmer.

Indexing

Hunt automatically creates a key named searches. It then stores the array of stemmed words in a key named default inside searches.

Note.create(:title => 'This is a test')
#<Note searches: {"default"=>["test"]}, title: "This is a test", _id: BSON::ObjectId('...')>

Note.create(:title => 'A different test')
#<Note searches: {"default"=>["differ", "test"]}, title: "A different test", _id: BSON::ObjectId('...')>

I chose a hash instead of an array because it is more flexible going forward (think named searches of different key combinations) and affects nothing in the short term.

Knowing this, we can create a Mongo index on this key to ensure that queries stay snappy:

Note.ensure_index :'searches.default'

If you always search scoped to a user and you have a user_id key, you could do a compound index:

Note.ensure_index [[:user_id, Mongo::Ascending], [:'searches.default', Mongo::Ascending]]

Hunt makes no assumptions on how you actually want to index the key. It merely stores the stuff you want to index.

Also, like I said before, you can index as many or as few keys as you want. I put an example of multiple fields on Github for you to peruse.

Conclusion

So what is the value of all this? It gave me a chance to play with stemming and solves my short term issue of wanting basic search without any extra infrastructure.

Would I recommend that you use it? Probably not. :) That said, feel free to hack around on it and see if you can add other interesting features, such as scoring.

If nothing else, I think it shows how flexible MongoDB can be.

5 Comments

  1. I love this! It’s brilliantly simple.

  2. Kristian Mandrup Kristian Mandrup

    Dec 22, 2010

    Really amazing stuff!!! :) I love it!

  3. This is a really nice. We are doing something very similar using a stemmer but we’ve also added a map reduce function to order the results by relevance. It’s a bit untidy at the moment but please take a look.

    https://github.com/beef/noodall-core/blob/master/lib/noodall/search.rb

    Hopefully you may find it usefull :)

  4. Steve England Steve England

    Jan 24, 2011

    D’oh. Sorry that is in a protected repo.

    I’ve added a Gist. https://gist.github.com/793165

  5. Will this work with Mongoid?

Sorry, comments are closed for this article to ease the burden of pruning spam.

About

Authored by John Nunemaker (Noo-neh-maker), a programmer who has fallen deeply in love with Ruby. Learn More.

Projects

Flipper
Release your software more often with fewer problems.
Flip your features.