Category: Tech

LVM snapshot performance

The Linux Logical Volume Manager (LVM) supports creating snapshots of logical volumes (LV) using the device mapper. Device mapper implements snapshots using a copy on write system, so whenever you write to either the source LV or the new snapshot LV, a copy is made first.

So a write to a normal LV is just a write, but a write to a snapshotted LV (or an LV snapshot) involves reading the original data, writing it elsewhere and then writing some metadata about it all.

This quite obviously impacts performance, and due to device mapper having a very basic implementation, it is particularly bad.  My tests show synchronous sequential writes to a snapshotted LV are around 90% slower than writes to a normal LV.

(more…)

Testing XML with rspec, xpath and libxml

I’m currently working with the virtualization API libvirt which uses XML to represent virtual machines and I’m generating this XML using Ruby.  I’m using rspec to test my code and wanted to test that my output was as I expected.  I started out with rspec-hpricot-matchers which worked fine until I started testing slightly more complex xml, which hpricot wasn’t handling well.

So I wrote a have_xml matcher using the rspec dsl which uses the libxml library to do the testing.  It’s so simple it’s not really worthy of a gem, so here it is (licensed under public domain).  The text check is optional and, to be honest, doesn’t belong here really.  It should be a separate matcher.


require 'libxml'

Spec::Matchers.define :have_xml do |xpath, text|
  match do |body|
    parser = LibXML::XML::Parser.string body
    doc = parser.parse
    nodes = doc.find(xpath)
    nodes.empty?.should be_false
    if text
      nodes.each do |node|
        node.content.should == text
      end
    end
    true
  end

  failure_message_for_should do |body|
    "expected to find xml tag #{xpath} in:\n#{body}"
  end

  failure_message_for_should_not do |response|
    "expected not to find xml tag #{xpath} in:\n#{body}"
  end

  description do
    "have xml tag #{xpath}"
  end
end

So, add that somewhere (usually spec/spec_helper.rb) and use it like this:


it "should include the xen_machine_id" do
  @xml.should have_xml('/domain/name', 'bb-example-001')
end

it "should include the network devices" do
  @xml.should have_xml "/domain/devices/interface[1]/ip[@address='1.2.3.4']"
  @xml.should have_xml "/domain/devices/interface[1]/mac[@address='aa:00:01:02:03:04']"
  @xml.should have_xml "/domain/devices/interface[1]/script[@path='/etc/xen/scripts/vif-bridge']"
  @xml.should have_xml "/domain/devices/interface[1]/source[@bridge='inetbr']"
end

Advertising and ad blocking

I’ve thought about advertising and ad-blockers a lot over the years, and the debate is getting some attention right now starting with a recent Ars Technica article, so I thought I’d put down some of my own thoughts on it.

Funding your content through advertising is hugely inefficient. Of the people who visit your site, usually only a tiny proportion click on (or notice) an advert, and only a tiny proportion of those then spends any money.  So a tiny, tiny proportion of your visitors give any money to your advertisers. So money filters down this system in tiny margins.  Then, at the bottom of the system, a tiny amount of the profits from the income covers the cost of advertising.  Then this money moves back up the system to you, usually via your advertising agent who takes a nice cut (I’ve heard Google pass as little as one twelfth onto the publisher in some cases).

And this doesn’t consider the costs of the advertiser choosing and designing the ad or the tonnes of bandwidth and gatrillions of CPU cycles used to serve the actual adverts.

It also does not consider externalities, such as pollution. Advertising is mind pollution. Advertising is designed to affect the behaviour of people for the benefit of the advertiser.  Why would anyone willingly expose themselves to something designed to steal their attention?

You might argue that advertising creates value – some viewers choose to buy when otherwise they wouldn’t have. But what of the huge proportion of people who just had their attention stolen? No value was created there.

Because not everyone is suckered in by it, advertising squanders billions of hours of attention every day to produce nothing.
(more…)

Xapian Fu: Full Text Indexing in Ruby

Xapian is an Open Source Search Engine Library written in C++. It has Ruby bindings, but they’re generated with SWIG, so they basically just mirror the C++ bindings – not very Ruby-like (and pretty ugly).

Being a self-confessed full text indexing nerd and a Ruby-lover, I wrote Xapian Fu: a library to provide access to Xapian that is more in line with “The Ruby Way”.

I started writing Xapian Fu exactly a year ago today but left it for a couple of months, then restarted work on it on the train on the way back from the 2009 Scotland on Rails conference.  Development was test driven, so it’s got an extensive test suite (using rspec).  Documentation is in rdoc and is quite detailed.  As of the latest version, it supports Ruby 1.9 too.

Xapian Fu basically gives you a Hash interface to Xapian – so you get a persistent Hash with full text indexing built in (and ACID transactions!).

Example

For example, create a database called example.db, put three documents into it and search them and print the results:

  require 'xapian-fu'
  include XapianFu
  db = XapianDb.new(:dir => 'example.db', :create => true,
                    :store => [:title, :year])
  db << { :title => 'Brokeback Mountain', :year => 2005 }
  db << { :title => 'Cold Mountain', :year => 2004 }
  db << { :title => 'Yes Man', :year => 2008 }
  db.flush
  db.search("mountain").each do |match|
    puts match.values[:title]
  end

There are of course a whole bunch more examples in the documentation.
(more…)

Ruby’s case statement uses ===

I’ve not found this stated clearly enough elsewhere so I’m doing so myself.

Ruby’s case statement calls the === method on the argument to each of the when statements

So, this example:

case my_number
  when 6883
    :prime
end

Will execute 6883 === my_number

This is all fine and dandy, because the === method on a Fixnum instance does what you’d expect in this scenario.

However, the === method on the Fixnum class does something different. It’s an alias of is_a?

That is cute, because it allows you to do this:

case my_number
  when Fixnum
    "Easy to memorize"
  when Bignum
    "Hard to memorize"
  end

But it won’t work as you might expect in this scenario:

my_type = Fixnum
case my_type
  when Fixnum
    "Fixed number"
end

This won’t work because Fixnum === Fixnum returns false because the Fixnum class is not an instance of Fixnum.

My workaround for this is to convert it to a string first. Not sure if that’s the best solution, but it works for me(tm).

my_type = Fixnum
case my_type.to_s
  when "Fixnum"
    "Fixed number"
end

Song In Code: Ramones, I wanna be sedated

Just the first verse:

go = Proc.new { sleep 24.hours }
self.wants :sedatation
begin ; nil ; end
case go ; where "no" ; nil ; end
self.wants :sedatation
self.get '/airport'
self.put '/airport/plane'
before self.insane? do
  3.times { hurry! }
end
return if self.can_control? :fingers
return if self.can_control? :brain
5.times { "no" }

I recorded me singing it, which is kinda stupid tbh.

I used mencoder to convert this to something Youtube found tasty. Like this:

mencoder -ss 15 -endpos 1:18 -vf pp=al:f,scale=480:360 -oac mp3lame -ovc lavc -lavcopts vcodec=libx264:mbd=1:vbitrate=2000 MOV01362.MPG -o MOV01362.x264

Also, pimp for another Geek/Ukelele project: Ukepedia, all 3 million Wikipedia articles one song at a time

Netfilter Conntrack Memory Usage

On a busy Linux Netfilter-based firewall, you usually need to up the maximum number of allowed tracked connections (or new connections will be denied and you’ll see log messages from the kernel link this: nf_conntrack: table full, dropping packet.

More connections will use more RAM, but how much?  We don’t want to overcommit, as the connection tracker uses unswappable memory and things will blow up. If we set aside 512MB for connection tracking, how many concurrent connections can we track?

There is some Netfilter documentation on wallfire.org, but it’s quite old. How can we be sure it’s still correct without completely understanding the Netfilter code? Does it account for real life constraints such as page size, or is it just derived from looking at the code? A running Linux kernel gives us all the info we need through it’s slabinfo proc file.
(more…)

My NWRUG Ferret Talk

I did a short talk on Ferret, the Ruby “Information Retreival Library”, at the North West Ruby Users Group last Thursday.  We had a bit of a theme too, with Will Jessop speaking about Sphinx and Asa Calow speaking about Solr.

I got to have a bit of a nosey around the Manchester BBC building too – though I was worried I’d open the wrong door and end up on TV. Didn’t fancy having to apologise to Jeremy Paxman.

Brightbox also sponsored some pizza, and gave away t-shirts and stickers like candy (there was no candy though).

My slides are available here, and contain a little example file system indexer. I made my slides with webby and S6 if you’re interested.

Women in Technology

Dom kicked up a women in technology debate again recently.  I’ve seen a few responses, from one chap who thinks women have achieved equality already to a woman who doesn’t think girl’s brains are generally good for “programming” – and someone else who thinks there isn’t a problem as long as you’re thick skinned enough to put up with a sexually hostile workplace.

The main gripe appears to be with “women only” conferences, such as the Women on the Web conference, organised by a group called Forward Ladies, or the Geek Girl dinners.

I think a fair summary of his, and some other commenters, opinion is that these “women-only” events don’t help the effort to get more women involved in technology. Comparing it to positive discrimination in many ways.

(more…)

Techietubbies live video podcast

I’m joining Dom and Rahoul tonight on a live video broadcast of their Techietubbies podcast thing.

From the site:

“Techietubbies is a weekly podcast covering a multitude of subjects, from a round up of the week’s tech news, live callers, competitions, questions and answers… and beer :)”

Though I’m driving, so no tech news for me. I think it’s recorded if you can’t see the live thing.  It’ll be broadcast live here via ustream.tv

My native language

Severed head I’m currently reading Nudge, by Richard H. Thaler and Cass R. Sunstein. It says many psychologists and neuroscientists agree that we humans have two general types of thinking, intuitive and rational. Also known as automatic and reflective.  When dodging a ball thrown at you, getting nervous when your aeroplane hits turbulence or smiling when you see a cute cat the automatic system is working.  When doing some mathematics, or writing a blog post, you (mostly) use reflective.  Speaking native, or “first” languages uses the automatic.  Speaking a second language usually uses reflective.

I realised that having tinkered with computers heavily almost my entire life, a lot of my “computer skills” have shifted into the intuitive, automatic systems.  I obviously (hopefully) use the rational systems a great deal, but underlying it is definitly intuition – the gut feeling of where to go next to solve the problem.  I regularly come up seemingly random avenues of investigation that lead to gold and I couldn’t say with any certainty why I thought of it.  I’m assuming this is the same for most computer geeks (and chess geeks, cooking geeks, music geeks etc. :).  It’s become a native language for us.

I don’t think the average rational system can easily deal with very complex problems.  It’s great for some more-linear concentrated work or planning, but for big stuff with lots of parts – hard work.  I think I usually research and “pre-process” a bunch of material around a problem using my rational system, then my automatic system gets to work mulling over the bigger picture.  Then when I’m making rational decisions about it, I’m heavily informed by the intuition. Or sometimes just when I’m showering.

Anyway, not sure where I was going with this other than a “aren’t I great” blog post. The summary would be, don’t rely on your rational systems so much. Give the intuitive some good mulling time. And shower regularly.

Virtualized Storage Talk at WYLUG

I’m doing a talk tonight about virtualizing your storage with LVM on Linux at the West Yorkshire Linux User Group. Sorry about the short notice here (it was announced earlier in the week elsewhere though).

My mate Paul Brook is talking about RAID on Linux too.

Come along for the talk, or the beer, or the socialising – or all three.