Reliable rake task execution

My News Sniffer project needs to regularly do some back-end stuff like checking a bunch of rss feeds and downloading web pages. I do this with some rake tasks, which I call using the cron daemon.  Recently I’ve been having problems where some tasks take a bit longer than usual to complete and end up running in parallel. This slows things down, which means more tasks end up running in parallel and then my little virtual machine eventually falls on it’s face under memory pressure.

I could implement some locking in my application, but it’s always good to avoid as much new code as possible so, in the good old *NIX fashion, I cobbled together a short bash script taking advantage of existing tools. What this does is executes the given rake task in the given rails root using the Debian/Ubuntu tool start-stop-daemon (provided by the dpkg package, which is therefore always installed). start-stop-daemon uses a pid file to keep track of the rake program for the given task, so it will never run a second concurrent instance of rake for this task. Cron just keeps trying to run it every 5 minutes or whatever, but it only runs once concurrently.

Before running the rake task, my script also sets the maximum amount of cpu time can be used to the given number of seconds using the ulimit command. So if anything goes a bit mad, taking ages to complete and uses loads of cpu time, it will be killed and then Cron will start it again on the next period. I should do the same for ram usage too, but haven’t done yet.

It’s not completely fool-proof: for one, if a rake task hangs using no cpu it will never end and never be started again but this has not happened before and is still better than all tasks failing to run because the machine is unresponsive.

It’s executed thus (I set different nice values for different tasks using the nice directly in my crontab):

run-rake.sh /home/john/railsroot taskname:whatever maxcpuseconds

And here is the script:

#!/bin/sh
railsdir=$1
raketask=$2
pidfile=$railsdir/log/$raketask.pid
cpuseconds=$3
export RAILS_ENV=production

ulimit -t $cpuseconds
exec /sbin/start-stop-daemon -d $railsdir -b -o -m -p $pidfile --start --startas /usr/bin/rake $raketask

Comments

matt says:

Nice write up. One thought on the endless hang. When the second task tries to start up, and sees that it can’t, I have it shoot me an email, so I can diagnose the problem.

Bill says:

I like the style! I am a BackgroundRB refugee seeking more predictable waters, just finished rewriting all our BDRB tasks to be rake-based, and this script seems like just the ticket to keep those tasks running and playing nicely.

Thanks!

Matt says:

I know this post is hella-old, but the script still works for me running a rake script in Rails 3 under Ubuntu-something-something-recent. Nice!

Leave a Reply