I did a short talk on Ferret, the Ruby “Information Retreival Library”, at the North West Ruby Users Group last Thursday. We had a bit of a theme too, with Will Jessop speaking about Sphinx and Asa Calow speaking about Solr.
I got to have a bit of a nosey around the Manchester BBC building too – though I was worried I’d open the wrong door and end up on TV. Didn’t fancy having to apologise to Jeremy Paxman.
Brightbox also sponsored some pizza, and gave away t-shirts and stickers like candy (there was no candy though).
My slides are available here, and contain a little example file system indexer. I made my slides with webby and S6 if you’re interested.
The Leeds offshoot of the North West Ruby User Group is meeting again this Thursday, 6th March, 7:00 PM – 11:00 PM. This time at Mr. Foley’s
Cask Ale House, on The Headrow (formerly Dr. Okells).
Expect unstructured discussion of Ruby, Ruby on Rails and other random stuff plus nice people, great beer and coffee and geeky tshirts.
The balcony back room of Mr Foley’s has been booked. Announce that you’re coming on the upcoming page.
Oh, and we now have a website: http://leedsrubything.org/
Some of the people of the North West Ruby User Group (who usually meet in Manchester) have organised the first little Leeds get together. No real name yet, so it’s the Leeds Ruby Thing for now.
No clear plan yet either, but expect unstructured discussion of Ruby and Ruby on Rails at least.
Thursday 7th February 2008 at 7pm in the Victoria Hotel pub. All welcome!
More details here: http://upcoming.yahoo.com/event/423116
Oh, btw, I’m doing a talk tomorrow at the North West Ruby User Group in Manchester about how we do the Ruby on Rails hosting at Brightbox.
I’ll be talking about SANs, Centos, Ubuntu, Xen, Apache, Lighty, NGINX, MySQL and other goodies. Heck, I might even mention Ruby, which would be nice considering it’s a Ruby user group.
My business partner Jeremy will be nattering about the business side and various other things.
Update: A couple of photos here and here.
Rubinius has support (as of today!) for running multiple instances of it’s VM within one process, each VM on it’s own *native* thread, each VM running many ruby green threads. Each VM has it’s own heap and so each VM could load different apps that wouldn’t interfere with each other. We have plans for a mod_rubinius for apache that takes full advantage of this feature. Stay tuned ;)
– Ezra Zygmuntowi on a comment on Ruby Inside.
Very interesting stuff. Why bother making Rails thread safe when you have an awesome Ruby VM such as Rubinius. I’d like to see Mongrel (or FastCGI! Bring back FastCGI!) make use of this somehow, running multiple Rails instances itself in one process and distributing requests between them. Interested in knowing how it’d deal with memory leaks in external libraries though (like rmagick suffers from).
Still, you lose finer grained access to most of the nice UNIX process management stuff though then, like limiting memory usage with ulimits, but nobody seems to be using that for Ruby deployment anyway. It’s all fiddling around with Monit and such instead (why always with the steps backward!).
My News Sniffer project needs to regularly do some back-end stuff like checking a bunch of rss feeds and downloading web pages. I do this with some rake tasks, which I call using the cron daemon. Recently I’ve been having problems where some tasks take a bit longer than usual to complete and end up running in parallel. This slows things down, which means more tasks end up running in parallel and then my little virtual machine eventually falls on it’s face under memory pressure.
I could implement some locking in my application, but it’s always good to avoid as much new code as possible so, in the good old *NIX fashion, I cobbled together a short bash script taking advantage of existing tools. What this does is executes the given rake task in the given rails root using the Debian/Ubuntu tool
start-stop-daemon (provided by the
dpkg package, which is therefore always installed).
start-stop-daemon uses a pid file to keep track of the rake program for the given task, so it will never run a second concurrent instance of rake for this task. Cron just keeps trying to run it every 5 minutes or whatever, but it only runs once concurrently.
Continue reading Reliable rake task execution
I’ve just tried out the Eclipse Integrated Development Environment. It appears to be named Eclipse due to how it consumes all your CPU and RAM, overshadowing anything else you might want to do.
I tried the RadRails addon thing out for developing Rails. It has some nice features but is rather a big jump from vim, which I’ve been using up until now. And to the best of my memory, vim has never crashed once. Whereas Eclipse has already crashed about 10 times in 24 hours.
UPDATE: I used Eclipse for all my Ruby on Rails development for almost two weeks but I’ve now given up. Even with leaky old Firefox and the monolith that is OpenOffice running concurrently I would rarely notice swapping, but Eclipse has decimated my swap partition. It’s resource requirements (mostly RAM) have turned my brand new laptop into a 486 DX66 with 4MB RAM and a broken CPU fan. It would regularly crash too. Any time saving it’s features might have offered were well cancelled out by all the lost work.
I am using the Aptana Ruby on Rails Eclipse addons, so maybe you can blame that, though the Haskell addons were misbehaving too. Ridiculous stuff.
I’m just going to learn how to use some of the more advanced features of VIM. I’m giving GVIM a go too. Oh VIM, how I missed you so.
Whilst working with the Ruby text search engine library Ferret, I came across a segfault in the query parser. It had already been reported and fixed, but I realised it can lead to a denial of service.
If you use Ferret anywhere that allows users to execute queries, those users can crash the Ruby process with a specially crafted query. This was quite serious for a number of my sites (not to mention slowing development of a current app) so I applied the fix to the released 0.11.4 source and repackaged it as 0.11.4.1.
Obviously this isn’t in any way official, but it works for me and I’m sharing here for anyone else affected. Gem, tgz and zip here and just the patch available here (derived from the author’s changeset to trunk).
The patch is against the release source, as the subversion repository seems to be down atm (I got the changeset from the web bases subversion viewer).
Peeking at the code of the upcoming Capistrano 2, I noticed you can define different scm variables for remote and local use, which is something I need (I was looking at the code in the hope it could do this :)
So, say I have my code stored in a subversion repository on my local disk, say
file:///project/trunk. That’s fine for when Capistrano is querying the latest revision, but the remote servers need to use the repository url
Without modifying the code, this was impossible with Capistrano v1. With Capistrano v2, you can prefix any scm configuration variable with
local_ and it will be used for local operations:
set :repository, "svn+ssh://mymachine/project/trunk"
set :local_repository, "file:///project/trunk"
I’m talking about Ruby on Rails at the West Yorkshire Linux User Group on Monday 11th June 2007. I’ll be covering what Rail is, how it works, and how you use it. Starts at 1900hrs at the E.C Stoner (snigger) Building at the University of Leeds. There follows a talk about Sun’s ZFS file system by Tom Hall, then we retire to The Victoria Hotel pub for some real ale and whatnot.
I’ll be the tall one with the curly hair… stood at the front… talking about Ruby on Rails.
Directions and stuff to be found on the WYLUG website.
I’ve been working on my News Sniffer project for the last few days, finishing up a two month experiment with using the Ruby Lucene implementation, Ferret, to index news articles and comments. More info on the News Sniffer blog. The project spanned two months due to some instability in the newer versions of Ferret, but the author responded to the bug reports and managed to fix all the problems so I decided to deploy.
Ferret offers huge improvements over the original MySQL full-text search method, and I’m looking forward to adding some fancy keyword statistics graphs in the future – perhaps showing censorship patterns in bbc comments with certain keywords.
Because News Sniffer is distributed across a number of servers, I used DRb (distributed Ruby) to allow them all to update one central Ferret index. DRb seems to work very well generally, and is amazingly simple to use, but I ran into a few problems with recycled objects and invalid references whilst using Ferret across it, apparently due to the garbage collector on the service side collecting things still in use on the client side. I think I eliminated most of them but they still crop up once in a while – I’ll be looking into this further.
Continue reading News Sniffer, Ferret and Rails
Dan J Bernstein’s (djb) daemontools is a set of programs to help you manage unix services. It provides a flexible, secure and convenient way of starting, stopping and sending signals to background processes. Combined with his ucspi-tcp tools, it can be used as an awesome replacement to inetd (it’s most often used in this way to run qmail, a secure and high-performance mta). It can be fiddly to set up and has a bit of a steep learning curve but I already use daemontools for various other stuff, so it was just natural for me to use it for Ruby on Rails deployment.
Continue reading Daemontools and Ruby on Rails