Tag: database

Xapian Fu: Full Text Indexing in Ruby

Xapian is an Open Source Search Engine Library written in C++. It has Ruby bindings, but they’re generated with SWIG, so they basically just mirror the C++ bindings – not very Ruby-like (and pretty ugly).

Being a self-confessed full text indexing nerd and a Ruby-lover, I wrote Xapian Fu: a library to provide access to Xapian that is more in line with “The Ruby Way”.

I started writing Xapian Fu exactly a year ago today but left it for a couple of months, then restarted work on it on the train on the way back from the 2009 Scotland on Rails conference.  Development was test driven, so it’s got an extensive test suite (using rspec).  Documentation is in rdoc and is quite detailed.  As of the latest version, it supports Ruby 1.9 too.

Xapian Fu basically gives you a Hash interface to Xapian – so you get a persistent Hash with full text indexing built in (and ACID transactions!).


For example, create a database called example.db, put three documents into it and search them and print the results:

  require 'xapian-fu'
  include XapianFu
  db = XapianDb.new(:dir => 'example.db', :create => true,
                    :store => [:title, :year])
  db << { :title => 'Brokeback Mountain', :year => 2005 }
  db << { :title => 'Cold Mountain', :year => 2004 }
  db << { :title => 'Yes Man', :year => 2008 }
  db.search("mountain").each do |match|
    puts match.values[:title]

There are of course a whole bunch more examples in the documentation.

Making a staging database with sed

Quick one – thought was was cute and useful.  I take a copy of live databases once in a while for use in the staging environments, but some apps have references to the live url in the there (WordPress does this and makes all its redirects using it, making it particularly difficult to test in staging).

This is a simple little way to change all the urls in the db as you clone it:

mysqldump -h live_db_host -u user -pmypass live_db | sed -e '{s/www.example.com/staging.example.com/g}' | mysql -h staging_db_host -u user -pmypass staging_db

Though depending on your MySQL table type you might want to dump to disk first, then pipe it through sed as your live tables might be locked (I’m not actually sure if mysqldump will block waiting for the other processes to catch up)