• Home
  • Personal
  • Tech
  • Politics
  • Photography
  • Xapian Fu: Full Text Indexing in Ruby

    January 31st, 2010

    Xapian is an Open Source Search Engine Library written in C++. It has Ruby bindings, but they’re generated with SWIG, so they basically just mirror the C++ bindings – not very Ruby-like (and pretty ugly).

    Being a self-confessed full text indexing nerd and a Ruby-lover, I wrote Xapian Fu: a library to provide access to Xapian that is more in line with “The Ruby Way”.

    I started writing Xapian Fu exactly a year ago today but left it for a couple of months, then restarted work on it on the train on the way back from the 2009 Scotland on Rails conference.  Development was test driven, so it’s got an extensive test suite (using rspec).  Documentation is in rdoc and is quite detailed.  As of the latest version, it supports Ruby 1.9 too.

    Xapian Fu basically gives you a Hash interface to Xapian – so you get a persistent Hash with full text indexing built in (and ACID transactions!).

    Example

    For example, create a database called example.db, put three documents into it and search them and print the results:

      require 'xapian-fu'
      include XapianFu
      db = XapianDb.new(:dir => 'example.db', :create => true,
                        :store => [:title, :year])
      db << { :title => 'Brokeback Mountain', :year => 2005 }
      db << { :title => 'Cold Mountain', :year => 2004 }
      db << { :title => 'Yes Man', :year => 2008 }
      db.flush
      db.search("mountain").each do |match|
        puts match.values[:title]
      end

    There are of course a whole bunch more examples in the documentation.
    Read the rest of this entry »

    Tags: active record, database, full text indexing, indexing, ruby, search, stemming, stopping, the ruby way, xapian, xapian-fu

    Posted in Ruby, Tech | 1 Comment »

  • Ruby’s case statement uses ===

    August 30th, 2009

    I’ve not found this stated clearly enough elsewhere so I’m doing so myself.

    Ruby’s case statement calls the === method on the argument to each of the when statements

    So, this example:

    case my_number
      when 6883
        :prime
    end
    

    Will execute 6883 === my_number

    This is all fine and dandy, because the === method on a Fixnum instance does what you’d expect in this scenario.

    However, the === method on the Fixnum class does something different. It’s an alias of is_a?

    That is cute, because it allows you to do this:

    case my_number
      when Fixnum
        "Easy to memorize"
      when Bignum
        "Hard to memorize"
      end
    

    But it won’t work as you might expect in this scenario:

    my_type = Fixnum
    case my_type
      when Fixnum
        "Fixed number"
    end
    

    This won’t work because Fixnum === Fixnum returns false because the Fixnum class is not an instance of Fixnum.

    My workaround for this is to convert it to a string first. Not sure if that’s the best solution, but it works for me(tm).

    my_type = Fixnum
    case my_type.to_s
      when "Fixnum"
        "Fixed number"
    end
    
    Tags: case, coding, condiitonal, programming, ruby, switch, when

    Posted in Ruby, Tech | 5 Comments »

  • John Leach

    • John Leach is a human being living in Leeds, UK.
  • Twitter

    • John @davidsmalley probably worst idea ever in reply to davidsmalley 6 hrs ago
    • More twitter updates →
  • Author Stuff

    • Brightbox Rails Hosting
    • Compost This
    • ELER Web Comic
    • New World Odour
    • News Sniffer
    • Photography
    • Profile and History
    • Recycle This
    • The Gillroyd Parade
    • Things to do today
    • Website
  • Friends

    • Caius Durling
    • Deb Bassett
    • Gianni Tedesco
    • Ian Higgins
    • Louisa Parry
    • Rahoul Baruah
    • Sleepy Kev
    • Tim Waters
    • Tom Hall
  • Stuff

    • ifup
    • Media Lens
    • Mia Bambina
    • News from nowhere
  • Meta

    • Log in
    • Entries RSS
    • Comments RSS
  • Search

Creative Commons License The text of this blog is licensed under the Creative Commons BY-ND license