• Home
  • Personal
  • Tech
  • Politics
  • Photography
  • Xapian Fu: Full Text Indexing in Ruby

    January 31st, 2010

    Xapian is an Open Source Search Engine Library written in C++. It has Ruby bindings, but they’re generated with SWIG, so they basically just mirror the C++ bindings – not very Ruby-like (and pretty ugly).

    Being a self-confessed full text indexing nerd and a Ruby-lover, I wrote Xapian Fu: a library to provide access to Xapian that is more in line with “The Ruby Way”.

    I started writing Xapian Fu exactly a year ago today but left it for a couple of months, then restarted work on it on the train on the way back from the 2009 Scotland on Rails conference.  Development was test driven, so it’s got an extensive test suite (using rspec).  Documentation is in rdoc and is quite detailed.  As of the latest version, it supports Ruby 1.9 too.

    Xapian Fu basically gives you a Hash interface to Xapian – so you get a persistent Hash with full text indexing built in (and ACID transactions!).

    Example

    For example, create a database called example.db, put three documents into it and search them and print the results:

      require 'xapian-fu'
      include XapianFu
      db = XapianDb.new(:dir => 'example.db', :create => true,
                        :store => [:title, :year])
      db << { :title => 'Brokeback Mountain', :year => 2005 }
      db << { :title => 'Cold Mountain', :year => 2004 }
      db << { :title => 'Yes Man', :year => 2008 }
      db.flush
      db.search("mountain").each do |match|
        puts match.values[:title]
      end

    There are of course a whole bunch more examples in the documentation.
    Read the rest of this entry »

    Tags: active record, database, full text indexing, indexing, ruby, search, stemming, stopping, the ruby way, xapian, xapian-fu

    Posted in Ruby, Tech | 1 Comment »

  • Ruby’s case statement uses ===

    August 30th, 2009

    I’ve not found this stated clearly enough elsewhere so I’m doing so myself.

    Ruby’s case statement calls the === method on the argument to each of the when statements

    So, this example:

    case my_number
      when 6883
        :prime
    end
    

    Will execute 6883 === my_number

    This is all fine and dandy, because the === method on a Fixnum instance does what you’d expect in this scenario.

    However, the === method on the Fixnum class does something different. It’s an alias of is_a?

    That is cute, because it allows you to do this:

    case my_number
      when Fixnum
        "Easy to memorize"
      when Bignum
        "Hard to memorize"
      end
    

    But it won’t work as you might expect in this scenario:

    my_type = Fixnum
    case my_type
      when Fixnum
        "Fixed number"
    end
    

    This won’t work because Fixnum === Fixnum returns false because the Fixnum class is not an instance of Fixnum.

    My workaround for this is to convert it to a string first. Not sure if that’s the best solution, but it works for me(tm).

    my_type = Fixnum
    case my_type.to_s
      when "Fixnum"
        "Fixed number"
    end
    
    Tags: case, coding, condiitonal, programming, ruby, switch, when

    Posted in Ruby, Tech | 5 Comments »

  • Song In Code: Ramones, I wanna be sedated

    August 21st, 2009

    Just the first verse:

    go = Proc.new { sleep 24.hours }
    self.wants :sedatation
    begin ; nil ; end
    case go ; where "no" ; nil ; end
    self.wants :sedatation
    self.get '/airport'
    self.put '/airport/plane'
    before self.insane? do
    3.times { hurry! }
    end
    return if self.can_control? :fingers
    return if self.can_control? :brain
    5.times { "no" }

    I recorded me singing it, which is kinda stupid tbh.

    I used mencoder to convert this to something Youtube found tasty. Like this:

    mencoder -ss 15 -endpos 1:18 -vf pp=al:f,scale=480:360 -oac mp3lame -ovc lavc -lavcopts vcodec=libx264:mbd=1:vbitrate=2000 MOV01362.MPG -o MOV01362.x264

    Also, pimp for another Geek/Ukelele project: Ukepedia, all 3 million Wikipedia articles one song at a time

    Tags: code, Music, ramones, ruby, sing, song, songsincode, ukelele

    Posted in Music, Tech | 5 Comments »

  • My NWRUG Ferret Talk

    March 24th, 2009

    I did a short talk on Ferret, the Ruby “Information Retreival Library”, at the North West Ruby Users Group last Thursday.  We had a bit of a theme too, with Will Jessop speaking about Sphinx and Asa Calow speaking about Solr.

    I got to have a bit of a nosey around the Manchester BBC building too – though I was worried I’d open the wrong door and end up on TV. Didn’t fancy having to apologise to Jeremy Paxman.

    Brightbox also sponsored some pizza, and gave away t-shirts and stickers like candy (there was no candy though).

    My slides are available here, and contain a little example file system indexer. I made my slides with webby and S6 if you’re interested.

    Tags: ferret, indexing, inverse index, rails, ruby, search, solr, sphinx, talk

    Posted in Ruby on Rails, Tech | 2 Comments »

  • Euruko Ruby Conference 2008 in Prague

    March 27th, 2008

    I’m in Prague with Brightbox for the Euruko Ruby Conference 2008 from tomorrow evening until Monday morning. I’ll post photos to the Brightbox Flickr photostream as we go along.  If anyone wants to meet up for a drink, email me at john at johnleach dotty co dotty uk.

    UPDATE: Photos here.

    Tags: brightbox, conference, prague, rails, ruby

    Posted in Personal | 3 Comments »

  • Leeds Ruby Thing #2, Thursday 6th March

    March 4th, 2008

    The Leeds offshoot of the North West Ruby User Group is meeting again this Thursday, 6th March, 7:00 PM – 11:00 PM.  This time at Mr. Foley’s
    Cask Ale House
    , on The Headrow (formerly Dr. Okells).

    Expect unstructured discussion of Ruby, Ruby on Rails and other random stuff plus nice people, great beer and coffee and geeky tshirts.

    The balcony back room of Mr Foley’s has been booked.  Announce that you’re coming on the upcoming page.

    Oh, and we now have a website: http://leedsrubything.org/

    Tags: beer, coding, geek, leeds, nwrug, programming, rails, ruby, social

    Posted in Ruby on Rails, Tech | No Comments »

  • Leeds Ruby Thing, Victoria Hotel 7th Feb 2008.

    February 3rd, 2008

    Some of the people of the North West Ruby User Group (who usually meet in Manchester) have organised the first little Leeds get together.  No real name yet, so it’s the Leeds Ruby Thing for now.

    No clear plan yet either, but expect unstructured discussion of Ruby and Ruby on Rails at least.

    Thursday 7th February 2008 at 7pm in the Victoria Hotel pub. All welcome!

    More details here: http://upcoming.yahoo.com/event/423116

    Tags: beer, coding, leeds, meetup, nwrug, programming, pub, rails, ruby, social

    Posted in Ruby on Rails | 1 Comment »

  • North West Ruby User Group Talk: Building Brightbox

    January 28th, 2008

    Oh, btw, I’m doing a talk tomorrow at the North West Ruby User Group in Manchester about how we do the Ruby on Rails hosting at Brightbox.

    I’ll be talking about SANs, Centos, Ubuntu, Xen, Apache, Lighty, NGINX, MySQL and other goodies. Heck, I might even mention Ruby, which would be nice considering it’s a Ruby user group.

    My business partner Jeremy will be nattering about the business side and various other things.

    Update: A couple of photos here and here.

    Tags: manchester, nwrug, rails, ruby, talk

    Posted in Personal, Ruby on Rails | No Comments »

  • Rubinius multiple instances, one process

    January 15th, 2008

    Rubinius has support (as of today!) for running multiple instances of it’s VM within one process, each VM on it’s own *native* thread, each VM running many ruby green threads. Each VM has it’s own heap and so each VM could load different apps that wouldn’t interfere with each other. We have plans for a mod_rubinius for apache that takes full advantage of this feature. Stay tuned ;)

    - Ezra Zygmuntowi on a comment on Ruby Inside.

    Very interesting stuff. Why bother making Rails thread safe when you have an awesome Ruby VM such as Rubinius. I’d like to see Mongrel (or FastCGI! Bring back FastCGI!) make use of this somehow, running multiple Rails instances itself in one process and distributing requests between them. Interested in knowing how it’d deal with memory leaks in external libraries though (like rmagick suffers from).

    Still, you lose finer grained access to most of the nice UNIX process management stuff though then, like limiting memory usage with ulimits, but nobody seems to be using that for Ruby deployment anyway. It’s all fiddling around with Monit and such instead (why always with the steps backward!).

    Tags: apache, deployment, mongrel, processes, rails, rubinius, ruby, threads

    Posted in Ruby on Rails | No Comments »

  • Reliable rake task execution

    December 3rd, 2007

    My News Sniffer project needs to regularly do some back-end stuff like checking a bunch of rss feeds and downloading web pages. I do this with some rake tasks, which I call using the cron daemon.  Recently I’ve been having problems where some tasks take a bit longer than usual to complete and end up running in parallel. This slows things down, which means more tasks end up running in parallel and then my little virtual machine eventually falls on it’s face under memory pressure.

    I could implement some locking in my application, but it’s always good to avoid as much new code as possible so, in the good old *NIX fashion, I cobbled together a short bash script taking advantage of existing tools. What this does is executes the given rake task in the given rails root using the Debian/Ubuntu tool start-stop-daemon (provided by the dpkg package, which is therefore always installed). start-stop-daemon uses a pid file to keep track of the rake program for the given task, so it will never run a second concurrent instance of rake for this task. Cron just keeps trying to run it every 5 minutes or whatever, but it only runs once concurrently.
    Read the rest of this entry »

    Tags: cron, crontab, Debian, rails, rake, ruby, tasks, Ubuntu

    Posted in Debian, Ruby on Rails, Tech, Ubuntu | 2 Comments »

  • Segfault in Ruby Ferret query parser

    September 13th, 2007

    Whilst working with the Ruby text search engine library Ferret, I came across a segfault in the query parser. It had already been reported and fixed, but I realised it can lead to a denial of service.

    If you use Ferret anywhere that allows users to execute queries, those users can crash the Ruby process with a specially crafted query.  This was quite serious for a number of my sites (not to mention slowing development of a current app) so I applied the fix to the released 0.11.4 source and repackaged it as 0.11.4.1.

    Obviously this isn’t in any way official, but it works for me and I’m sharing here for anyone else affected. Gem, tgz and zip here and just the patch available here (derived from the author’s changeset to trunk).

    The patch is against the release source, as the subversion repository seems to be down atm (I got the changeset from the web bases subversion viewer).

    Get upgrading!

    Tags: crash, denial-of-service, dos, ferret, rails, ruby, segfault

    Posted in Ruby on Rails, Security | No Comments »

  • local and remote subversion repositories with Capistrano 2

    June 17th, 2007

    Peeking at the code of the upcoming Capistrano 2, I noticed you can define different scm variables for remote and local use, which is something I need (I was looking at the code in the hope it could do this :)

    So, say I have my code stored in a subversion repository on my local disk, say file:///project/trunk. That’s fine for when Capistrano is querying the latest revision, but the remote servers need to use the repository url svn+ssh://mymachine/project/trunk.

    Without modifying the code, this was impossible with Capistrano v1. With Capistrano v2, you can prefix any scm configuration variable with local_ and it will be used for local operations:

    set :repository, "svn+ssh://mymachine/project/trunk"
    set :local_repository, "file:///project/trunk"
    Tags: capistrano, deployment, rails, ruby, subversion, svn

    Posted in Ruby on Rails, Tech | No Comments »

« Previous Entries
  • John Leach

    • John Leach is a human being living in Leeds, UK.
  • Twitter

    • John ooh, I bug I reported in libvirt in Ubuntu Hardy in May 2008 just got into hardy-proposed! The system works! http://is.gd/aJzU7 8 hrs ago
    • More twitter updates →
  • Author Stuff

    • Brightbox Rails Hosting
    • Compost This
    • ELER Web Comic
    • New World Odour
    • News Sniffer
    • Photography
    • Profile and History
    • Recycle This
    • The Gillroyd Parade
    • Things to do today
    • Website
  • Friends

    • Caius Durling
    • Deb Bassett
    • Gianni Tedesco
    • Ian Higgins
    • Louisa Parry
    • Rahoul Baruah
    • Sleepy Kev
    • Tim Waters
    • Tom Hall
  • Stuff

    • ifup
    • Media Lens
    • Mia Bambina
    • News from nowhere
  • Meta

    • Log in
    • Entries RSS
    • Comments RSS
  • Search

Creative Commons License The text of this blog is licensed under the Creative Commons BY-ND license