• Home
  • Personal
  • Tech
  • Politics
  • Photography
  • Xapian Fu: Full Text Indexing in Ruby

    January 31st, 2010

    Xapian is an Open Source Search Engine Library written in C++. It has Ruby bindings, but they’re generated with SWIG, so they basically just mirror the C++ bindings – not very Ruby-like (and pretty ugly).

    Being a self-confessed full text indexing nerd and a Ruby-lover, I wrote Xapian Fu: a library to provide access to Xapian that is more in line with “The Ruby Way”.

    I started writing Xapian Fu exactly a year ago today but left it for a couple of months, then restarted work on it on the train on the way back from the 2009 Scotland on Rails conference.  Development was test driven, so it’s got an extensive test suite (using rspec).  Documentation is in rdoc and is quite detailed.  As of the latest version, it supports Ruby 1.9 too.

    Xapian Fu basically gives you a Hash interface to Xapian – so you get a persistent Hash with full text indexing built in (and ACID transactions!).

    Example

    For example, create a database called example.db, put three documents into it and search them and print the results:

      require 'xapian-fu'
      include XapianFu
      db = XapianDb.new(:dir => 'example.db', :create => true,
                        :store => [:title, :year])
      db << { :title => 'Brokeback Mountain', :year => 2005 }
      db << { :title => 'Cold Mountain', :year => 2004 }
      db << { :title => 'Yes Man', :year => 2008 }
      db.flush
      db.search("mountain").each do |match|
        puts match.values[:title]
      end

    There are of course a whole bunch more examples in the documentation.
    Read the rest of this entry »

    Tags: active record, database, full text indexing, indexing, ruby, search, stemming, stopping, the ruby way, xapian, xapian-fu

    Posted in Ruby, Tech | 1 Comment »

  • Making a staging database with sed

    April 26th, 2008

    Quick one – thought was was cute and useful.  I take a copy of live databases once in a while for use in the staging environments, but some apps have references to the live url in the there (Wordpress does this and makes all its redirects using it, making it particularly difficult to test in staging).

    This is a simple little way to change all the urls in the db as you clone it:

    mysqldump -h live_db_host -u user -pmypass live_db | sed -e '{s/www.example.com/staging.example.com/g}' | mysql -h staging_db_host -u user -pmypass staging_db

    Though depending on your MySQL table type you might want to dump to disk first, then pipe it through sed as your live tables might be locked (I’m not actually sure if mysqldump will block waiting for the other processes to catch up)

    Tags: beta, database, development, mysql, sed, staging

    Posted in Tech | 1 Comment »

  • John Leach

    • John Leach is a human being living in Leeds, UK.
  • Twitter

    • John @davidsmalley probably worst idea ever in reply to davidsmalley 6 hrs ago
    • More twitter updates →
  • Author Stuff

    • Brightbox Rails Hosting
    • Compost This
    • ELER Web Comic
    • New World Odour
    • News Sniffer
    • Photography
    • Profile and History
    • Recycle This
    • The Gillroyd Parade
    • Things to do today
    • Website
  • Friends

    • Caius Durling
    • Deb Bassett
    • Gianni Tedesco
    • Ian Higgins
    • Louisa Parry
    • Rahoul Baruah
    • Sleepy Kev
    • Tim Waters
    • Tom Hall
  • Stuff

    • ifup
    • Media Lens
    • Mia Bambina
    • News from nowhere
  • Meta

    • Log in
    • Entries RSS
    • Comments RSS
  • Search

Creative Commons License The text of this blog is licensed under the Creative Commons BY-ND license