• Home
  • Personal
  • Tech
  • Politics
  • Photography
  • Xapian Fu: Full Text Indexing in Ruby

    January 31st, 2010

    Xapian is an Open Source Search Engine Library written in C++. It has Ruby bindings, but they’re generated with SWIG, so they basically just mirror the C++ bindings – not very Ruby-like (and pretty ugly).

    Being a self-confessed full text indexing nerd and a Ruby-lover, I wrote Xapian Fu: a library to provide access to Xapian that is more in line with “The Ruby Way”.

    I started writing Xapian Fu exactly a year ago today but left it for a couple of months, then restarted work on it on the train on the way back from the 2009 Scotland on Rails conference.  Development was test driven, so it’s got an extensive test suite (using rspec).  Documentation is in rdoc and is quite detailed.  As of the latest version, it supports Ruby 1.9 too.

    Xapian Fu basically gives you a Hash interface to Xapian – so you get a persistent Hash with full text indexing built in (and ACID transactions!).

    Example

    For example, create a database called example.db, put three documents into it and search them and print the results:

      require 'xapian-fu'
      include XapianFu
      db = XapianDb.new(:dir => 'example.db', :create => true,
                        :store => [:title, :year])
      db << { :title => 'Brokeback Mountain', :year => 2005 }
      db << { :title => 'Cold Mountain', :year => 2004 }
      db << { :title => 'Yes Man', :year => 2008 }
      db.flush
      db.search("mountain").each do |match|
        puts match.values[:title]
      end

    There are of course a whole bunch more examples in the documentation.
    Read the rest of this entry »

    Tags: active record, database, full text indexing, indexing, ruby, search, stemming, stopping, the ruby way, xapian, xapian-fu

    Posted in Ruby, Tech | 1 Comment »

  • Ruby’s case statement uses ===

    August 30th, 2009

    I’ve not found this stated clearly enough elsewhere so I’m doing so myself.

    Ruby’s case statement calls the === method on the argument to each of the when statements

    So, this example:

    case my_number
      when 6883
        :prime
    end
    

    Will execute 6883 === my_number

    This is all fine and dandy, because the === method on a Fixnum instance does what you’d expect in this scenario.

    However, the === method on the Fixnum class does something different. It’s an alias of is_a?

    That is cute, because it allows you to do this:

    case my_number
      when Fixnum
        "Easy to memorize"
      when Bignum
        "Hard to memorize"
      end
    

    But it won’t work as you might expect in this scenario:

    my_type = Fixnum
    case my_type
      when Fixnum
        "Fixed number"
    end
    

    This won’t work because Fixnum === Fixnum returns false because the Fixnum class is not an instance of Fixnum.

    My workaround for this is to convert it to a string first. Not sure if that’s the best solution, but it works for me(tm).

    my_type = Fixnum
    case my_type.to_s
      when "Fixnum"
        "Fixed number"
    end
    
    Tags: case, coding, condiitonal, programming, ruby, switch, when

    Posted in Ruby, Tech | 5 Comments »

  • Song In Code: Ramones, I wanna be sedated

    August 21st, 2009

    Just the first verse:

    go = Proc.new { sleep 24.hours }
    self.wants :sedatation
    begin ; nil ; end
    case go ; where "no" ; nil ; end
    self.wants :sedatation
    self.get '/airport'
    self.put '/airport/plane'
    before self.insane? do
    3.times { hurry! }
    end
    return if self.can_control? :fingers
    return if self.can_control? :brain
    5.times { "no" }

    I recorded me singing it, which is kinda stupid tbh.

    I used mencoder to convert this to something Youtube found tasty. Like this:

    mencoder -ss 15 -endpos 1:18 -vf pp=al:f,scale=480:360 -oac mp3lame -ovc lavc -lavcopts vcodec=libx264:mbd=1:vbitrate=2000 MOV01362.MPG -o MOV01362.x264

    Also, pimp for another Geek/Ukelele project: Ukepedia, all 3 million Wikipedia articles one song at a time

    Tags: code, Music, ramones, ruby, sing, song, songsincode, ukelele

    Posted in Music, Tech | 5 Comments »

  • Boron Fights Grass

    July 5th, 2009

    Boron Fights Grass

    Tags: armley, bite, boron, cats, claws, fight, grass, leeds, sun

    Posted in Photoblog, Photography | No Comments »

  • Netfilter Conntrack Memory Usage

    June 17th, 2009

    On a busy Linux Netfilter-based firewall, you usually need to up the maximum number of allowed tracked connections (or new connections will be denied and you’ll see log messages from the kernel link this: nf_conntrack: table full, dropping packet.

    More connections will use more RAM, but how much?  We don’t want to overcommit, as the connection tracker uses unswappable memory and things will blow up. If we set aside 512MB for connection tracking, how many concurrent connections can we track?

    There is some Netfilter documentation on wallfire.org, but it’s quite old. How can we be sure it’s still correct without completely understanding the Netfilter code? Does it account for real life constraints such as page size, or is it just derived from looking at the code? A running Linux kernel gives us all the info we need through it’s slabinfo proc file.
    Read the rest of this entry »

    Tags: conntrack, firewall, iptables, kernel, limit, linux, max, netfilter, performance, ram, slab

    Posted in Tech | 1 Comment »

  • My Ukepedia Talk at Barcamp Leeds 2009

    June 5th, 2009

    Tim Dobson very kindly recorded and uploaded my talk on the Ukepedia at Barcamp Leeds last Saturday.

    For those of your with short attention spans, I finally get started with the talk at about 2mins 30, and start singing the first article, Otitis Media, at about 7mins.

    Tags: barcamp, bcleeds09, leeds, microsoft, otitis media, performance, sing, song, ukelele, ukepedia, wikipedia

    Posted in Personal | No Comments »

  • Live this Saturday at the Packhorse in Leeds, The Gillroyd Parade

    May 12th, 2009

    My band, The Gillroyd Parade, are hosting an evening of acoustic music at the Packhorse Pub this Saturday (7pm to 11pm, 16th May). Supported by Ukelele Bitch Slap. Do come along, it’d be just dandy to see you.  Full poster here.

    The Gillroyd Parade

    Tags: folk, gospel, guitar, harmonica, sci-fi, theremin, ukelele

    Posted in Music, Personal | No Comments »

  • April Fool: A man in Jalawla walked into a bar…

    April 1st, 2009

    Medialens spotted that the BBC attributed a bomb attack on Monday in Iraq to “al-Qaeda”, with apparently little evidence.  They wrote to the BBC’s “man in Baghdad”, Hugh Sykes, and asked him “what is the evidence that al-Qaeda, rather than some other insurgent group, were behind the attacks”?.

    Hugh’s answer genuinely made me think this was an early April Fool’s joke. In fact I’m still not sure Medialens aren’t making me look like an idiot:

    No proof, but circumstantial evidence and reasonable presumption of AQI [al-Qaeda in Iraq] involvement – very much their modus operandum. Suicide attacks are their signature method, and this was a dramatic detonation suggesting a lot of explosive – again, very AQI.

    And…who else would do this?

    So, process of elimination, history of AQI attacks in Diyala etc.

    And the logic of it Sunni Arab vs Iraqi Kurds. As a man in Jalawla told Reuters:

    “Al-Qaida is targeting the Kurds because it believes that
    we are involved in the political process and collaborating
    with the Americans.”

    This blows my mind. “very AQI” and “a man in Jalawla told Reuters”. “Who else would do this?”

    As Medialens point out, the BBC claim they are “committed to evidence-based journalism”. Except they pick and choose when their committment applies, such as when they refused to report the use of banned weapons by US forces in their November 2004 assault on Falljuah.

    Tags: al-qaeda, bbc, bomb, iraq, journalism, media, medialens, news, propaganda

    Posted in Politics | No Comments »

  • My NWRUG Ferret Talk

    March 24th, 2009

    I did a short talk on Ferret, the Ruby “Information Retreival Library”, at the North West Ruby Users Group last Thursday.  We had a bit of a theme too, with Will Jessop speaking about Sphinx and Asa Calow speaking about Solr.

    I got to have a bit of a nosey around the Manchester BBC building too – though I was worried I’d open the wrong door and end up on TV. Didn’t fancy having to apologise to Jeremy Paxman.

    Brightbox also sponsored some pizza, and gave away t-shirts and stickers like candy (there was no candy though).

    My slides are available here, and contain a little example file system indexer. I made my slides with webby and S6 if you’re interested.

    Tags: ferret, indexing, inverse index, rails, ruby, search, solr, sphinx, talk

    Posted in Ruby on Rails, Tech | 2 Comments »

  • Women in Technology

    March 16th, 2009

    Dom kicked up a women in technology debate again recently.  I’ve seen a few responses, from one chap who thinks women have achieved equality already to a woman who doesn’t think girl’s brains are generally good for “programming” – and someone else who thinks there isn’t a problem as long as you’re thick skinned enough to put up with a sexually hostile workplace.

    The main gripe appears to be with “women only” conferences, such as the Women on the Web conference, organised by a group called Forward Ladies, or the Geek Girl dinners.

    I think a fair summary of his, and some other commenters, opinion is that these “women-only” events don’t help the effort to get more women involved in technology. Comparing it to positive discrimination in many ways.

    Read the rest of this entry »

    Tags: conference, event, forward-ladies, geek-girl, geeks, geekup, leeds, social, Tech, women, women on the web

    Posted in Personal, Tech | 13 Comments »

  • Leeds Market Big Wigs

    March 7th, 2009

    More Leeds Market photos here on my Flickr profile.

    Tags: heads, leeds, mannequins, market, photos, wigs

    Posted in Photoblog, Photography | No Comments »

  • Techietubbies live video podcast

    February 16th, 2009

    I’m joining Dom and Rahoul tonight on a live video broadcast of their Techietubbies podcast thing.

    From the site:

    “Techietubbies is a weekly podcast covering a multitude of subjects, from a round up of the week’s tech news, live callers, competitions, questions and answers… and beer :)”

    Though I’m driving, so no tech news for me. I think it’s recorded if you can’t see the live thing.  It’ll be broadcast live here via ustream.tv

    Tags:

    Posted in Personal, Tech | 2 Comments »

« Previous Entries
  • John Leach

    • John Leach is a human being living in Leeds, UK.
  • Twitter

    • John Remember folks, WYLUG tonight at Old Broadcasting House. "ZFS / btrfs / Petabytes on a budget" by Tom Hall: http://is.gd/7WAic 16 hrs ago
    • More twitter updates →
  • Author Stuff

    • Brightbox Rails Hosting
    • Compost This
    • ELER Web Comic
    • New World Odour
    • News Sniffer
    • Photography
    • Profile and History
    • Recycle This
    • The Gillroyd Parade
    • Things to do today
    • Website
  • Friends

    • Caius Durling
    • Deb Bassett
    • Gianni Tedesco
    • Ian Higgins
    • Louisa Parry
    • Rahoul Baruah
    • Sleepy Kev
    • Tim Waters
    • Tom Hall
  • Stuff

    • ifup
    • Media Lens
    • Mia Bambina
    • News from nowhere
  • Meta

    • Log in
    • Entries RSS
    • Comments RSS
  • Search

Creative Commons License The text of this blog is licensed under the Creative Commons BY-ND license