<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>John Leach's Blog</title>
	<atom:link href="http://johnleach.co.uk/words/feed" rel="self" type="application/rss+xml" />
	<link>http://johnleach.co.uk/words</link>
	<description>Stuff I think, see and do</description>
	<lastBuildDate>Wed, 16 May 2012 23:51:01 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Heaton Royds School</title>
		<link>http://johnleach.co.uk/words/1149/heaton-royds-school</link>
		<comments>http://johnleach.co.uk/words/1149/heaton-royds-school#comments</comments>
		<pubDate>Wed, 16 May 2012 23:51:01 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Photography]]></category>
		<category><![CDATA[abandoned]]></category>
		<category><![CDATA[closed]]></category>
		<category><![CDATA[heaton]]></category>
		<category><![CDATA[heaton royd school]]></category>
		<category><![CDATA[heaton woods]]></category>
		<category><![CDATA[rotten city]]></category>
		<category><![CDATA[school]]></category>

		<guid isPermaLink="false">http://johnleach.co.uk/words/?p=1149</guid>
		<description><![CDATA[Louisa, Lily the dog and I went for a walk in the woods near my parents home in Heaton, Bradford and we came across the now closed down Heaton Royds school. It seems to have closed down around March 2009 &#8230; <a href="http://johnleach.co.uk/words/1149/heaton-royds-school">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Louisa, Lily the dog and I went for a walk in the woods near my parents home in Heaton, Bradford and we came across the now closed down <a href="http://maps.google.co.uk/maps?ll=53.822691,-1.785501&#038;spn=0.001126,0.00284&#038;sll=53.800651,-4.064941&#038;sspn=18.514185,46.538086&#038;t=h">Heaton Royds school</a>. It seems to have closed down around March 2009 and has been vandalised a number of times. I took a few photos and scared a fox who had apparently moved in.</p>

<div class="ngg-galleryoverview" id="ngg-gallery-21-1149">


	
	<!-- Thumbnails -->
		
	<div id="ngg-image-317" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/img_0738.jpg" title="Heaton Royds School is OUTSTANDING" class="thickbox" rel="set_21" >
								<img title="Heaton Royds School is OUTSTANDING" alt="Heaton Royds School is OUTSTANDING" src="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/thumbs/thumbs_img_0738.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-319" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/img_0740.jpg" title=" " class="thickbox" rel="set_21" >
								<img title="img_0740" alt="img_0740" src="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/thumbs/thumbs_img_0740.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-321" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/img_0741.jpg" title=" " class="thickbox" rel="set_21" >
								<img title="img_0741" alt="img_0741" src="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/thumbs/thumbs_img_0741.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-323" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/img_0744.jpg" title=" " class="thickbox" rel="set_21" >
								<img title="img_0744" alt="img_0744" src="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/thumbs/thumbs_img_0744.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-325" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/img_0745.jpg" title=" " class="thickbox" rel="set_21" >
								<img title="img_0745" alt="img_0745" src="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/thumbs/thumbs_img_0745.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-327" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/img_0746.jpg" title=" " class="thickbox" rel="set_21" >
								<img title="img_0746" alt="img_0746" src="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/thumbs/thumbs_img_0746.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-329" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/img_0747.jpg" title=" " class="thickbox" rel="set_21" >
								<img title="img_0747" alt="img_0747" src="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/thumbs/thumbs_img_0747.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-331" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/img_0748.jpg" title=" " class="thickbox" rel="set_21" >
								<img title="img_0748" alt="img_0748" src="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/thumbs/thumbs_img_0748.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-333" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/img_0749.jpg" title=" " class="thickbox" rel="set_21" >
								<img title="img_0749" alt="img_0749" src="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/thumbs/thumbs_img_0749.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-335" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/img_0750.jpg" title=" " class="thickbox" rel="set_21" >
								<img title="img_0750" alt="img_0750" src="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/thumbs/thumbs_img_0750.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-337" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/img_0751.jpg" title=" " class="thickbox" rel="set_21" >
								<img title="img_0751" alt="img_0751" src="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/thumbs/thumbs_img_0751.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-339" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/img_0752.jpg" title=" " class="thickbox" rel="set_21" >
								<img title="img_0752" alt="img_0752" src="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/thumbs/thumbs_img_0752.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-341" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/img_0753.jpg" title=" " class="thickbox" rel="set_21" >
								<img title="img_0753" alt="img_0753" src="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/thumbs/thumbs_img_0753.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-343" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/img_0754.jpg" title="Mary Winter Headteacher" class="thickbox" rel="set_21" >
								<img title="Mary Winter Headteacher" alt="Mary Winter Headteacher" src="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/thumbs/thumbs_img_0754.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-345" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/img_0755.jpg" title=" " class="thickbox" rel="set_21" >
								<img title="img_0755" alt="img_0755" src="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/thumbs/thumbs_img_0755.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-347" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/img_0756.jpg" title=" " class="thickbox" rel="set_21" >
								<img title="img_0756" alt="img_0756" src="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/thumbs/thumbs_img_0756.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-349" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/img_0757.jpg" title=" " class="thickbox" rel="set_21" >
								<img title="img_0757" alt="img_0757" src="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/thumbs/thumbs_img_0757.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-351" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/img_0758.jpg" title=" " class="thickbox" rel="set_21" >
								<img title="img_0758" alt="img_0758" src="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/thumbs/thumbs_img_0758.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-353" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/img_0759.jpg" title=" " class="thickbox" rel="set_21" >
								<img title="img_0759" alt="img_0759" src="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/thumbs/thumbs_img_0759.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-355" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/img_0760.jpg" title=" " class="thickbox" rel="set_21" >
								<img title="img_0760" alt="img_0760" src="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/thumbs/thumbs_img_0760.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-357" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/img_0761.jpg" title=" " class="thickbox" rel="set_21" >
								<img title="img_0761" alt="img_0761" src="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/thumbs/thumbs_img_0761.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-359" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/img_0762.jpg" title=" " class="thickbox" rel="set_21" >
								<img title="img_0762" alt="img_0762" src="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/thumbs/thumbs_img_0762.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-361" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/img_0764.jpg" title=" " class="thickbox" rel="set_21" >
								<img title="img_0764" alt="img_0764" src="http://johnleach.co.uk/words/wp-content/photos/heaton-royds-school/thumbs/thumbs_img_0764.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 	 	
	<!-- Pagination -->
 	<div class="ngg-clear"></div> 	
</div>


]]></content:encoded>
			<wfw:commentRss>http://johnleach.co.uk/words/1149/heaton-royds-school/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Rate limiting with Apache and mod-security</title>
		<link>http://johnleach.co.uk/words/1073/rate-limiting-with-apache-and-mod-security</link>
		<comments>http://johnleach.co.uk/words/1073/rate-limiting-with-apache-and-mod-security#comments</comments>
		<pubDate>Tue, 15 May 2012 09:46:02 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Security]]></category>
		<category><![CDATA[Tech]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[dos]]></category>
		<category><![CDATA[http]]></category>
		<category><![CDATA[limit]]></category>
		<category><![CDATA[mod-security]]></category>
		<category><![CDATA[netfilter]]></category>
		<category><![CDATA[nginx]]></category>
		<category><![CDATA[rate]]></category>
		<category><![CDATA[rate limiting]]></category>

		<guid isPermaLink="false">http://johnleach.co.uk/words/?p=1073</guid>
		<description><![CDATA[Rate limiting by request in Apache isn&#8217;t easy, but I finally figured out a satisfactory way of doing it using the mod-security Apache module. We&#8217;re using it at Brightbox to prevent buggy scripts rinsing our metadata service. In particular, we &#8230; <a href="http://johnleach.co.uk/words/1073/rate-limiting-with-apache-and-mod-security">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Rate limiting by request in Apache isn&#8217;t easy, but I finally figured out a satisfactory way of doing it using the <a href="http://www.modsecurity.org/">mod-security</a> Apache module. We&#8217;re using it at <a href="http://brightbox.com">Brightbox</a> to prevent buggy scripts rinsing our <a href="http://docs.brightbox.com/reference/metadata-service/">metadata service</a>. In particular, we needed th e ability to allow a high burst of initial requests, as that&#8217;s our normal usage pattern. So here&#8217;s how to do it.</p>
<p>Install mod-security (on Debian/Ubuntu, just install the <code>libapache2-modsecurity</code> package) and configure it in your virtual host definition like this:</p>
<pre><code>SecRuleEngine On

&lt;LocationMatch "^/somepath"&gt;
  SecAction initcol:ip=%{REMOTE_ADDR},pass,nolog
  SecAction "phase:5,deprecatevar:ip.somepathcounter=1/1,pass,nolog"
  SecRule IP:SOMEPATHCOUNTER "@gt 60" "phase:2,pause:300,deny,status:509,setenv:RATELIMITED,skip:1,nolog"
  SecAction "phase:2,pass,setvar:ip.somepathcounter=+1,nolog"
  Header always set Retry-After "10" env=RATELIMITED
&lt;/LocationMatch&gt;

ErrorDocument 509 "Rate Limit Exceeded"</code></pre>
<p><span id="more-1073"></span><br />
This does a few things and has a couple of knobs for you to tweak depending on your requirements. The first <code>SecAction</code> initializes the state, in this case by IP address. You can do this by a cookie if you like, but I needed it done by IP address (if you&#8217;re using a reverse proxy of some kind then get the IP from the X-Forwarded-For header here instead).</p>
<p>The second <code>SecAction</code> deprecates the counter by 1 every 1 second. This is setting the base rate of our rate limit, one per second. I&#8217;ve named the counter <code>somepathcounter</code> here, feel free to call it what you want and use different names for different rate limiting different parts of your site.</p>
<p>The <code>SecRule</code> checks to see if the counter is greather than 60 and if so it sleeps 300ms, sets the <code>RATELIMITED</code> environment variable and returns a 509 code response.  This is setting the burst rate of our rate limit, to 60. Any IP can do a burst of 60 requests as fast as it likes and it then limited to 1 per second. If it makes no further requests for 60 seconds then the counter is decremented back to 0, which means their burst has been fully recharged.</p>
<p>The last <code>SecAction</code> increments the counter for every successful request (the previous <code>SecRule</code> skips this line if the request was rate limited).</p>
<p>Then the <code>Header</code> definition ensures a header is set whenever a request is rate limited, giving a hint to the client that they shouldn&#8217;t try again for 10 seconds. This is obviously just a guide and a lot of clients don&#8217;t implement it (and it&#8217;s really only valid on a 503 status anyway) so it&#8217;s a little bit of wishful thinking really.</p>
<p>Then we define a neat <code>ErrorDocument</code> for the 509 response to give a better clue to the client about what is happening.</p>
<h3>509 HTTP Response</h3>
<p>I chose to use the 509 HTTP code which isn&#8217;t a standard (it&#8217;s Apache&#8217;s own &#8220;Bandwidth Limit Exceeded&#8221; code).  Technically a 503 (with the <code>Retry-After</code> header) is perhaps a better choice but I already use 503 for maintenance pages and I wanted to differentiate between the two responses in the logs. Twitter invented their own status 420 &#8220;Enhance Your Calm&#8221; code, but that appears to mostly be an elaborate marijuana reference. <a href="http://tools.ietf.org/html/rfc6585#section-4">RFC 6586</a> defines 429 &#8220;Too Many Requests&#8221; for this purpose but it&#8217;s very new and Apache won&#8217;t let you set an <code>ErrorDocument</code> for it yet. And it&#8217;s <a href="http://mehack.com/inventing-a-http-response-code-aka-seriously#pcomment_commentunit_5524453">debatable</a> whether this is a 4xx error mode or a 5xx error mode.</p>
<h3>mod-security state data</h3>
<p>mod-security needs to store the rate limit state between restarts so you need to tell it where to write that data. I create a directory in <code>/var/lib/mod-security</code> but you can just stick it in <code>/tmp</code> if you like:</p>
<pre><code>
SecDataDir /tmp
</code></pre>
<p>If you don&#8217;t define this, mod-security just seems to silently not apply the rate limiting, so you need it even if you don&#8217;t care about state between restarts.</p>
<p>It doesn&#8217;t appear possible to store the mod-security state in a shared database so you can&#8217;t rate limit <em>accurately</em> when you have multiple load balanced web servers.</p>
<h3>Sleeping considered bad</h3>
<p>This implementation sleeps 300ms per request when the rate has been exceeded but sleeping here is generally not a great idea. It ties up the Apache worker for the duration of the sleep, so whilst it will relieve the load on your backend app (which would otherwise likely use lots of CPU) it arguably makes it easier to tie up all of Apache&#8217;s workers and take your site offline.</p>
<p>There are easier ways to tie up a web server though, and without a sleep many clients will just immediately retry the request, spamming the logs with useless messages and using HTTP parsing cpu etc.    And without the rate limiting at all, your app might be even slower and could tie things up just as easily. So feel free to tune this as you like.</p>
<p>Basically, remember that this is not a malicious denial of service protection system &#8211; we&#8217;re just using it enforce a basic policy, most usually breached due to a mistake rather than an attack.</p>
<h3>Other options</h3>
<p><a href="http://dembol.org/blog/mod_cband/">mod_cband</a> sounds pretty good but I can&#8217;t find documentation on how to rate limit requests, just bandwidth, and it&#8217;s not packaged for Debian/Ubuntu.</p>
<p><a href="http://www.zdziarski.com/blog/?page_id=442">mod_evasive</a> sounds like something you might want to use for DoS protection.</p>
<p>If you&#8217;re using NGINX then the <a href="http://wiki.nginx.org/HttpLimitReqModule">HTTP limit req module</a> sounds nice and simple.</p>
<p><a href="http://www.netfilter.org/">Netfilter</a>&#8216;s hashlimit matcher is very powerful and has the benefit of being in a layer before Apache entirely, so you can save even more CPU cycles. Netfilter also has the ability to rate limit logging too, so you won&#8217;t fill your disk with useless logs.  If you want to return a 509 response, instead of just dropping or rejecting the SYN packet, then you can redirect rate limited connections to an Apache virtual host that just returns 509s (put it on a different port and use the DNAT Netfilter target).</p>
]]></content:encoded>
			<wfw:commentRss>http://johnleach.co.uk/words/1073/rate-limiting-with-apache-and-mod-security/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Full text indexing of syslog messages with Riak</title>
		<link>http://johnleach.co.uk/words/1063/riak-syslog</link>
		<comments>http://johnleach.co.uk/words/1063/riak-syslog#comments</comments>
		<pubDate>Wed, 25 Apr 2012 10:37:55 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Tech]]></category>
		<category><![CDATA[full text]]></category>
		<category><![CDATA[index]]></category>
		<category><![CDATA[riak]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[splunk]]></category>
		<category><![CDATA[syslog]]></category>

		<guid isPermaLink="false">http://johnleach.co.uk/words/?p=1063</guid>
		<description><![CDATA[I&#8217;ve just released a little tool I wrote called riak-syslog which takes your syslog messages and puts them into a Riak cluster and then lets you search them using Riak&#8217;s full text search. Rather than re-implement the wheel, riak-syslog expects that a &#8230; <a href="http://johnleach.co.uk/words/1063/riak-syslog">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve just released a little tool I wrote called <a href="https://github.com/johnl/riak-syslog">riak-syslog</a> which takes your syslog messages and puts them into a <a href="http://wiki.basho.com/Riak.html">Riak cluster</a> and then lets you search them using <a href="http://wiki.basho.com/Riak-Search.html">Riak&#8217;s full text search</a>.</p>
<p>Rather than re-implement the wheel, riak-syslog expects that a syslog daemon will handle receiving syslog messages and will be able to provide them in a specific format &#8211; there is documentation on getting this running with <a href="http://en.wikipedia.org/wiki/Rsyslog">rsyslog</a> on Ubuntu.</p>
<p>I&#8217;ve used it to gather and store a few hundred gig of syslogs over the last several months on an small internal Riak cluster on <a href="http://brightbox.com/">Brightbox Cloud</a> and it&#8217;s working well (which can&#8217;t be said of a similar setup I did with Solr which caved in after a while and needed some fine tuning!)</p>
<p>There is documentation on getting it set up in the <a href="https://github.com/johnl/riak-syslog/blob/master/README.md">README</a>, and some examples of how to conduct searches too.</p>
<p>If you want to play with Riak, you can build a four node cluster spanning two data-centres in five minutes on <a href="http://brightbox.com/blog/2012/01/04/riak-cluster/">Brightbox Cloud</a>.</p>
<p>You might also be interested in my post about <a href="http://johnleach.co.uk/words/744/indexing-syslog-messages-with-solr">indexing syslog messages with Solr</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://johnleach.co.uk/words/1063/riak-syslog/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>My dog Lily, asleeping</title>
		<link>http://johnleach.co.uk/words/1023/lily-asleeping</link>
		<comments>http://johnleach.co.uk/words/1023/lily-asleeping#comments</comments>
		<pubDate>Wed, 04 Apr 2012 22:09:02 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Photoblog]]></category>
		<category><![CDATA[Photography]]></category>
		<category><![CDATA[dog]]></category>
		<category><![CDATA[lily]]></category>
		<category><![CDATA[sleeping]]></category>

		<guid isPermaLink="false">http://johnleach.co.uk/words/?p=1023</guid>
		<description><![CDATA[Understandably tired after a long day of sleeping.]]></description>
			<content:encoded><![CDATA[
<a href="http://johnleach.co.uk/words/wp-content/gallery/random/img_0035.jpg" title="" class="thickbox" rel="singlepic307" >
	<img class="ngg-singlepic" src="http://johnleach.co.uk/words/wp-content/photos/cache/307__550x_img_0035.jpg" alt="Lily asleeping" title="Lily asleeping" />
</a>

<p>Understandably tired after a long day of sleeping.</p>
]]></content:encoded>
			<wfw:commentRss>http://johnleach.co.uk/words/1023/lily-asleeping/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Documentation that tells a story</title>
		<link>http://johnleach.co.uk/words/997/documentation-that-tells-a-story</link>
		<comments>http://johnleach.co.uk/words/997/documentation-that-tells-a-story#comments</comments>
		<pubDate>Sat, 11 Feb 2012 11:51:45 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Tech]]></category>
		<category><![CDATA[documentation]]></category>
		<category><![CDATA[examples]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[writing]]></category>

		<guid isPermaLink="false">http://johnleach.co.uk/words/?p=997</guid>
		<description><![CDATA[When reading technical documentation I too often come across examples like this: let&#8217;s assume you have a client called foo and a server called bar or command examples like: mysqldump -h server1 &#124; mysql -h server2 When I write documentation, &#8230; <a href="http://johnleach.co.uk/words/997/documentation-that-tells-a-story">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>When reading technical documentation I too often come across examples like this:</p>
<blockquote><p>let&#8217;s assume you have a client called foo and a server called bar</p></blockquote>
<p>or command examples like:</p>
<blockquote><p><code>mysqldump -h server1 | mysql -h server2</code></p></blockquote>
<p>When I write documentation, I prefer to tell a story. What is the client called, Steven? Are we taking a mysqldump of a production server and writing it to a staging server?</p>
<p>Human brains like stories. It&#8217;s much easier to keep track of facts if they have some kind of meaning. Many memory improvement techniques use stories to link things together. And when you&#8217;re reading documentation, you&#8217;re usually learning some new concept anyway &#8211; so you&#8217;re adding unnecessary cognitive load by using abstract labels like Foo and Bar or A and B.</p>
<p>In my <a href="http://johnleach.co.uk/words/323/git-submodules-in-n-easy-steps">Git submodules post</a> I name the two example projects <code>your_project</code> and <code>other_project</code> and use it consistently throughout. You never have to rememeber whether &#8220;Foo&#8221; is the remote project or not.</p>
<p>One of my own favourites is an <a href="http://johnleach.co.uk/documents/heartbeat/">old heartbeat cluster guide</a> I wrote. It involves two clusters, each of which consisted of two servers working together. I named the first cluster <code>JuliusCaesar</code>, naming the two nodes <code>Julius</code> and <code>Caesar</code>. The second cluster is called <code>MarcusAurelius</code>. Throughout the documentation, I&#8217;m able to refer to any server just by it&#8217;s name and you can know where it is in the network.</p>
<p>It&#8217;s part of why I like using <a href="http://rspec.info/">rspec</a> to do testing, because it encourages you to tell a story rather than to just test arbitrary values.</p>
<p>So, put some thought into your examples. Tell a story. Make it easier for the reader to keep track of all this new stuff they&#8217;re learning.</p>
]]></content:encoded>
			<wfw:commentRss>http://johnleach.co.uk/words/997/documentation-that-tells-a-story/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Killing and butchering a chicken</title>
		<link>http://johnleach.co.uk/words/984/killing-and-butchering-a-chicken</link>
		<comments>http://johnleach.co.uk/words/984/killing-and-butchering-a-chicken#comments</comments>
		<pubDate>Thu, 22 Dec 2011 00:24:49 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Personal]]></category>
		<category><![CDATA[Photography]]></category>
		<category><![CDATA[chickens]]></category>
		<category><![CDATA[food]]></category>
		<category><![CDATA[slaughter]]></category>

		<guid isPermaLink="false">http://johnleach.co.uk/words/?p=984</guid>
		<description><![CDATA[At the weekend I killed a chicken, and Louisa and I plucked it and then Louisa butchered it. It was one of a few chicks that turned out to be male, so his fate was to be killed and eaten &#8230; <a href="http://johnleach.co.uk/words/984/killing-and-butchering-a-chicken">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>At the weekend I killed a chicken, and Louisa and I plucked it and then Louisa butchered it. It was one of a few chicks that turned out to be male, so his fate was to be killed and eaten by us.</p>
<p>This was the second chicken I&#8217;ve killed myself (for food or otherwise) &#8211; the first one took a bit of mental preparation but this one was a bit easier.</p>
<p>He has a pretty good free range life, was killed quickly and we&#8217;ll waste very little of him (we&#8217;ve already had a soup made from cooking his carcass in the slowcooker).</p>
<p>Louisa has written up the experience in more detail <a href="http://www.thereallygoodlife.com/6619/killing-butchering-a-chicken/">on her blog</a>.</p>
<p>Here are some photos &#8211; you might consider them a little grisly.</p>

<div class="ngg-galleryoverview" id="ngg-gallery-18-984">


	
	<!-- Thumbnails -->
		
	<div id="ngg-image-290" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/killing-and-butchering-a-chicken/dsc_8572.jpg" title=" " class="thickbox" rel="set_18" >
								<img title="dsc_8572" alt="dsc_8572" src="http://johnleach.co.uk/words/wp-content/photos/killing-and-butchering-a-chicken/thumbs/thumbs_dsc_8572.jpg" width="100" height="74" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-292" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/killing-and-butchering-a-chicken/dsc_8579.jpg" title=" " class="thickbox" rel="set_18" >
								<img title="dsc_8579" alt="dsc_8579" src="http://johnleach.co.uk/words/wp-content/photos/killing-and-butchering-a-chicken/thumbs/thumbs_dsc_8579.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-294" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/killing-and-butchering-a-chicken/dsc_8583.jpg" title=" " class="thickbox" rel="set_18" >
								<img title="dsc_8583" alt="dsc_8583" src="http://johnleach.co.uk/words/wp-content/photos/killing-and-butchering-a-chicken/thumbs/thumbs_dsc_8583.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-296" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/killing-and-butchering-a-chicken/dsc_8586.jpg" title=" " class="thickbox" rel="set_18" >
								<img title="dsc_8586" alt="dsc_8586" src="http://johnleach.co.uk/words/wp-content/photos/killing-and-butchering-a-chicken/thumbs/thumbs_dsc_8586.jpg" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-298" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/killing-and-butchering-a-chicken/dsc_8592.jpg" title=" " class="thickbox" rel="set_18" >
								<img title="dsc_8592" alt="dsc_8592" src="http://johnleach.co.uk/words/wp-content/photos/killing-and-butchering-a-chicken/thumbs/thumbs_dsc_8592.jpg" width="100" height="74" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-300" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/killing-and-butchering-a-chicken/dsc_8604.jpg" title=" " class="thickbox" rel="set_18" >
								<img title="dsc_8604" alt="dsc_8604" src="http://johnleach.co.uk/words/wp-content/photos/killing-and-butchering-a-chicken/thumbs/thumbs_dsc_8604.jpg" width="100" height="74" />
							</a>
		</div>
	</div>
	
		
 	 	
	<!-- Pagination -->
 	<div class="ngg-clear"></div> 	
</div>


<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://johnleach.co.uk/words/984/killing-and-butchering-a-chicken/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Inside Google Plus</title>
		<link>http://johnleach.co.uk/words/843/inside-google-plus</link>
		<comments>http://johnleach.co.uk/words/843/inside-google-plus#comments</comments>
		<pubDate>Wed, 12 Oct 2011 09:38:08 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Politics]]></category>
		<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://johnleach.co.uk/words/?p=843</guid>
		<description><![CDATA[Steven Levy interviewed Google&#8217;s Bradley Horowitz about Google+: Wired: Some users are chafing at Google’s insistence that they provide real names. Explain the policy against pseudonyms. Horowitz: Google believes in three modes of usage—anonymous, pseudonymous, and identified, and we have a &#8230; <a href="http://johnleach.co.uk/words/843/inside-google-plus">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Steven Levy <a href="http://www.wired.com/magazine/2011/09/ff_google_horowitz/ ">interviewed Google&#8217;s Bradley Horowitz about Google+</a>:</p>
<blockquote><p>Wired: Some users are chafing at Google’s insistence that they provide real names. Explain the policy against pseudonyms.</p>
<p>Horowitz: Google believes in three modes of usage—anonymous, pseudonymous, and identified, and we have a spectrum of products that use all three. For anonymity, you can go into incognito mode in Chrome and the information associated with using the browser is not retained. Gmail and Blogger are pseudonymous—you can go be captainblackjack@gmail.com. But with products like Google Checkout, you’re doing a financial transaction and you have to use your real name.For now, Google+ falls into that last category. There are great debates going on about this—I saw one comment yesterday that claimed that pseudonyms protect the experience of women in the system. I felt compelled to respond, because I’ve gotten feedback from women who say that the accountability of real names makes them feel much more comfortable in Google+.</p></blockquote>
<p>Notice that Horowitz did not answer the question, and what he did say was just ridiculous nonsense. Steven Levy at Wired didn&#8217;t seem to notice, or care.</p>
<p>Horowitz tries to make us think that we need our real name when making a financial transaction.  Thousands of years of currency proves that is not the case.</p>
<p>Horowitz then goes on to blurrily equate making a financial transaction with sharing videos of cats on Google+.</p>
<p>And then the cherry on the top: Google+ protects women.</p>
<p>This was the closest there was to a serious question in the whole interview and Horowitz just laughed out of his arse at it.</p>
<p><span id="more-843"></span></p>
<p>I emailed Steven Levy and asked him why, given the opportunity of interviewing Horowitz, he didn&#8217;t ask anything close to a serious question. It&#8217;s been over a week and I&#8217;ve had no response.</p>
<p>TL;DR: Don&#8217;t trust Steven Levy to report honestly about Google.</p>
<pre><tt><span style="color: #3a3935;">Hi Steven,</span></tt>

<tt><span style="color: #3a3935;">I hope you're well.</span></tt>

<tt><span style="color: #3a3935;">I read your article "Inside Google Plus"[1] the other day and I have a</span></tt>
<tt><span style="color: #3a3935;">couple of questions I hope you'll have time to answer,</span></tt>

<tt><span style="color: #3a3935;">I was surprised to notice that you didn't ask any questions about</span></tt>
<tt><span style="color: #3a3935;">privacy - Google+ unequivocally raises numerous serious privacy</span></tt>
<tt><span style="color: #3a3935;">concerns.  Why did you decide not to put any of these questions to</span></tt>
<tt><span style="color: #3a3935;">Horowitz?</span></tt>

<tt><span style="color: #3a3935;">Regarding your question about pseudonyms, Horowitz didn't really answer</span></tt>
<tt><span style="color: #3a3935;">your question. He kind of claimed that financial transactions cannot be</span></tt>
<tt><span style="color: #3a3935;">anonymous and that Google+ is somehow like a financial transaction.  To</span></tt>
<tt><span style="color: #3a3935;">be frank, his answer seemed just hand waving - why didn't you challenge</span></tt>
<tt><span style="color: #3a3935;">him on this harder?</span></tt>

<tt><span style="color: #3a3935;">Best regards,</span></tt>

<tt><span style="color: #3a3935;">John Leach</span></tt>

<tt><span style="color: #3a3935;">[1] <a href="http://www.wired.com/magazine/2011/09/ff_google_horowitz/">http://www.wired.com/magazine/2011/09/ff_google_horowitz/</a></span></tt></pre>
]]></content:encoded>
			<wfw:commentRss>http://johnleach.co.uk/words/843/inside-google-plus/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Ceph at London Devops, 25th July 2011</title>
		<link>http://johnleach.co.uk/words/815/london-devops-july-2011</link>
		<comments>http://johnleach.co.uk/words/815/london-devops-july-2011#comments</comments>
		<pubDate>Fri, 22 Jul 2011 08:00:06 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Tech]]></category>
		<category><![CDATA[ceph]]></category>
		<category><![CDATA[cluster]]></category>
		<category><![CDATA[distributed]]></category>
		<category><![CDATA[filesystem]]></category>
		<category><![CDATA[filesystems]]></category>
		<category><![CDATA[london]]></category>
		<category><![CDATA[petabyte]]></category>
		<category><![CDATA[speaker]]></category>
		<category><![CDATA[talk]]></category>

		<guid isPermaLink="false">http://johnleach.co.uk/words/?p=815</guid>
		<description><![CDATA[I&#8217;ll be down London way on Monday 25th July giving a talk about Ceph at the London Devops meetup. Come along and learn about petabyte scale distributed filesystems, or just come along and drink beer with us!]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ll be down London way on Monday 25th July <a href="http://lanyrd.com/2011/london-devops-july/sggdk/">giving a talk about Ceph</a> at the <a href="http://lanyrd.com/2011/london-devops-july/">London Devops meetup</a>.  Come along and learn about petabyte scale distributed filesystems, or just come along and drink beer with us!</p>
]]></content:encoded>
			<wfw:commentRss>http://johnleach.co.uk/words/815/london-devops-july-2011/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Redirecting outgoing mail with Postfix</title>
		<link>http://johnleach.co.uk/words/797/redirecting-outgoing-mail-with-postfix</link>
		<comments>http://johnleach.co.uk/words/797/redirecting-outgoing-mail-with-postfix#comments</comments>
		<pubDate>Sat, 11 Jun 2011 16:48:11 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[GNU/Linux]]></category>
		<category><![CDATA[Tech]]></category>
		<category><![CDATA[email]]></category>
		<category><![CDATA[mail]]></category>
		<category><![CDATA[postfix]]></category>
		<category><![CDATA[redirect]]></category>
		<category><![CDATA[rewrite]]></category>
		<category><![CDATA[smtp]]></category>
		<category><![CDATA[staging]]></category>

		<guid isPermaLink="false">http://johnleach.co.uk/words/?p=797</guid>
		<description><![CDATA[We have a various staging deployments of our systems at Brightbox and need to test that the emails they send are correct. We have a bunch of test accounts registered with various email addresses and we wanted them all to &#8230; <a href="http://johnleach.co.uk/words/797/redirecting-outgoing-mail-with-postfix">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>We have a various staging deployments of our systems at <a href="http://brightbox.com">Brightbox</a> and need to test that the emails they send are correct. We have a bunch of test accounts registered with various email addresses and we wanted them all to go to our dev team, rather than the original recipient.</p>
<p>Rather than write support for this into our apps, we used Postfix to redirect the mail to our devs.</p>
<p>In our case, our staging deployments use a local installation of Postfix and the systems are generally not used by anything else, which makes this dead easy.</p>
<p>Firstly, write a rewrite map file, with the following one line of content. Call it <code>/etc/postfix/recipient_canonical_map</code>:</p>
<pre><code>/./ devteam@example.com </code></pre>
<p>Then configure Postfix like this (in <code>/etc/postfix/main.cf</code>):</p>
<pre><code> recipient_canonical_classes = envelope_recipient recipient_canonical_maps = regexp:/etc/postfix/recipient_canonical_map </code></pre>
<p>Now all mail going through this relay will be redirected to <code>devteam@example.com</code>. It rewrites only the envelope, so the important headers are not changed.</p>
]]></content:encoded>
			<wfw:commentRss>http://johnleach.co.uk/words/797/redirecting-outgoing-mail-with-postfix/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Puppet dependencies and run stages</title>
		<link>http://johnleach.co.uk/words/771/puppet-dependencies-and-run-stages</link>
		<comments>http://johnleach.co.uk/words/771/puppet-dependencies-and-run-stages#comments</comments>
		<pubDate>Sun, 29 May 2011 15:24:58 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Tech]]></category>
		<category><![CDATA[apt]]></category>
		<category><![CDATA[Debian]]></category>
		<category><![CDATA[dependencies]]></category>
		<category><![CDATA[puppet]]></category>
		<category><![CDATA[Ubuntu]]></category>

		<guid isPermaLink="false">http://johnleach.co.uk/words/?p=771</guid>
		<description><![CDATA[I&#8217;m using Puppet to manage some apt repositories on Ubuntu and have had a dependency problem. I want to write the source configs before running apt-get update and I want to run that before installing any packages.  Otherwise, a manifests &#8230; <a href="http://johnleach.co.uk/words/771/puppet-dependencies-and-run-stages">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m using Puppet to manage some apt repositories on Ubuntu and have had a dependency problem. I want to write the source configs before running <code>apt-get update</code> and I want to run that before installing any packages.  Otherwise, a manifests that tries to install a package from a custom repository will fail, either because the repository is not configured or the apt metadata hasn&#8217;t been retrieved yet.</p>
<p>Due to Puppet changes being idempotent, this is usually solvable by running puppet a few times (ew). Or you can do this properly by diligently setting all the dependencies for all of your packages on your <code>apt-get update</code> command, and having that depend on your source configs, but that&#8217;s pretty fiddly.</p>
<p><span id="more-771"></span></p>
<p>New versions of Puppet now have a feature called <a href="http://projects.puppetlabs.com/projects/1/wiki/Release_Notes#Run-Stages">run stages</a> that can be used to solve this problem. You put your source configs and apt update command in a run stage and tell puppet to run everything in that run stage before the other stages (there is an implicit run stage called main where everything goes by default).</p>
<p>This seems quite neat at first, but only the new parameterized classes can be set to be in certain run stages &#8211; so you end up putting things into a class just to put it into a run stage. It&#8217;s really not much better than being diligent with your dependencies for this problem &#8211; worse in many ways. (There is a <a href="http://projects.puppetlabs.com/issues/2658">long discussion about the implementation of run stages</a> in the Puppet issue tracker that might help you understand the use case for them).</p>
<p>But there is a <a href="http://projects.puppetlabs.com/projects/1/wiki/Release_Notes#New-relationship-syntax">new ﻿relationship syntax</a> to make setting dependencies much easier which I&#8217;m using to mass-depend all packages on my <code>apt-get update</code> command:</p>
<pre>class apt {
  exec { "apt-update":
    command =&gt; "/usr/bin/apt-get update"
  }

  # Ensure apt is setup before running apt-get update
  Apt::Key &lt;| |&gt; -&gt; Exec["apt-update"]
  Apt::Source &lt;| |&gt; -&gt; Exec["apt-update"]

  # Ensure apt-get update has been run before installing any packages
  Exec["apt-update"] -&gt; Package &lt;| |&gt;
}</pre>
<p>Remember, Puppet is declarative &#8211; the dependencies set there get applied to all <code>Apt::Key</code>, <code>Apt::Source</code> and <code>Package</code> resources even if they&#8217;ve not yet been defined. (This example also assumes the only packages you&#8217;re defining are apt ones btw!)</p>
<p>In summary, run stages are hard and fiddly. You probably don&#8217;t need them. Learn how to use the new relationship syntax.</p>
]]></content:encoded>
			<wfw:commentRss>http://johnleach.co.uk/words/771/puppet-dependencies-and-run-stages/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Indexing syslog messages with solr</title>
		<link>http://johnleach.co.uk/words/744/indexing-syslog-messages-with-solr</link>
		<comments>http://johnleach.co.uk/words/744/indexing-syslog-messages-with-solr#comments</comments>
		<pubDate>Thu, 03 Mar 2011 23:10:09 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Tech]]></category>
		<category><![CDATA[audit]]></category>
		<category><![CDATA[centralized]]></category>
		<category><![CDATA[graylog]]></category>
		<category><![CDATA[indexing]]></category>
		<category><![CDATA[intrusion detection]]></category>
		<category><![CDATA[logging]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[rsyslog]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[syslog]]></category>

		<guid isPermaLink="false">http://johnleach.co.uk/words/?p=744</guid>
		<description><![CDATA[I&#8217;ve been thinking about centralized indexing and searching of logs for a while and the other day I came across a project called Graylog2 that does just that. It provides a service to receive messages over the network (in a couple &#8230; <a href="http://johnleach.co.uk/words/744/indexing-syslog-messages-with-solr">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been thinking about centralized indexing and searching of logs for a while and the other day I came across a project called <a href="http://www.graylog2.org/">Graylog2</a> that does just that. It provides a service to receive messages over the network (in a couple of formats, including syslog) and writes them into mongodb. It then has a rails application that lets you browse and search the logs.</p>
<p>It&#8217;s neat but I wasn&#8217;t quite happy with the search options &#8211; I&#8217;ve always thought logs should be indexed with a real full text indexer. So, I knocked up a couple of scripts to do just that, as a proof of concept.</p>
<p>It uses <a href="http://www.rsyslog.com/">rsyslog</a> to receive the messages and write them to a named pipe.  A small ruby script called rsyslog-solr reads from the other end of the pipe and writes batches of the incoming messages to the full text indexer. I chose <a href="http://lucene.apache.org/solr/">solr</a> as the full text indexer as it has some very good options for scaling up, which will be necessary when indexing lots of logs.</p>
<p>Solr indexes, compresses and stores the messages sent to it, so we can retrieve the full text without having to store the original log. I wrote a custom schema definition optimized for this.</p>
<p>Then another script, rsyslog-solr-search, is used to query Solr and display the matching messages.</p>
<p>Querying is fun, for example I&#8217;ve searched all ssh authentication failures across all hosts and then searched on the originating IPs to see what other probes they made.</p>
<p>You don&#8217;t have to do advanced searches though, you can just display all logs from the last hour, or day or whatever.</p>
<p>One important note, any user that can generate logs that are sent to the system can cause a denial of service attack by sending specially malformed messages. This can be fixed by moving the formatting of the log entries from rsyslog into the ruby script, but I&#8217;ve not done it yet.</p>
<p>I&#8217;ve <a href="https://github.com/johnl/rsyslog-solr">pushed the code to github</a> under the MIT license. Feel free to improve it.</p>
]]></content:encoded>
			<wfw:commentRss>http://johnleach.co.uk/words/744/indexing-syslog-messages-with-solr/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Breaking my blog embargo</title>
		<link>http://johnleach.co.uk/words/740/breaking-m-blog-embargo</link>
		<comments>http://johnleach.co.uk/words/740/breaking-m-blog-embargo#comments</comments>
		<pubDate>Sat, 19 Feb 2011 18:48:38 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Personal]]></category>
		<category><![CDATA[blog]]></category>
		<category><![CDATA[blogging]]></category>

		<guid isPermaLink="false">http://johnleach.co.uk/words/?p=740</guid>
		<description><![CDATA[I often find myself in the situation where I&#8217;ve not blogged for a long time which makes it difficult to write a new one, as if the long delay means the next blog has to be weighty and impressive. I&#8217;ve &#8230; <a href="http://johnleach.co.uk/words/740/breaking-m-blog-embargo">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I often find myself in the situation where I&#8217;ve not blogged for a long time which makes it difficult to write a new one, as if the long delay means the next blog has to be weighty and impressive.</p>
<p>I&#8217;ve realised this is an illusion. Nobody is sitting in wait for my next blog entry. Nobody else has noticed I&#8217;ve not blogged in ages. Only I know I&#8217;ve not blogged in ages. And I certainly shouldn&#8217;t care about what I think.</p>
<p>So I&#8217;m breaking my accidental self-imposed blog embargo with this mundane entry which says nothing of import.</p>
<p>If you were hoping for something of more consequence then I make no apology, though your hopes disprove my above realisation, which is irksome to say the least.</p>
]]></content:encoded>
			<wfw:commentRss>http://johnleach.co.uk/words/740/breaking-m-blog-embargo/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The cost of free</title>
		<link>http://johnleach.co.uk/words/835/the-cost-of-free</link>
		<comments>http://johnleach.co.uk/words/835/the-cost-of-free#comments</comments>
		<pubDate>Mon, 27 Sep 2010 17:48:21 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Tech]]></category>
		<category><![CDATA[creative commons]]></category>
		<category><![CDATA[free]]></category>
		<category><![CDATA[free culture]]></category>
		<category><![CDATA[guardian]]></category>

		<guid isPermaLink="false">http://johnleach.co.uk/words/?p=835</guid>
		<description><![CDATA[Helienne Lindvall writes in the Guardian: Cory Doctorow [will] cost you $25,000 (£15,800) to get him to speak at your conference&#8230; But what does Doctorow speak about? Well, ironically, he&#8217;s a proponent of giving away content for free as a &#8230; <a href="http://johnleach.co.uk/words/835/the-cost-of-free">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Helienne Lindvall <a href="http://www.guardian.co.uk/media/pda/2010/sep/27/free-online-content">writes in the Guardian</a>:</p>
<blockquote><p>Cory Doctorow [will] cost you $25,000 (£15,800) to get him to speak at your conference&#8230;</p>
<p>But what does Doctorow speak about? Well, ironically, he&#8217;s a proponent of giving away content for free as a business model – and for years he&#8217;s been telling the music industry to adapt to it. Am I the only one to see the irony in this?</p></blockquote>
<p>I don&#8217;t see the irony. This is exactly what Doctorow recommends. Give your content away and charge to perform it. Give your music away and charge for your gigs. I bet the content of his slides is creative commons, and I bet the recordings of this talks are creative commons even. But if watching a video of him isn&#8217;t enough for you and you want him in person, then you pay for it.</p>
<p>It seems that Helienne Lindvall does not understand even the basic ideas of free culture.</p>
<p>UPDATE: Helienne Lindvall seems to have been misinformed anyway, as per this tweet from Doctorow himself:</p>
<blockquote><p>@helienne, I&#8217;m afraid you were badly misinformed. I don&#8217;t have a &#8220;booker&#8221;, I don&#8217;t charge anything like the sum quoted, most talks are free</p></blockquote>
<p>UPDATE: Doctorow has since <a href="http://www.guardian.co.uk/technology/blog/2010/oct/05/free-online-content-cory-doctorow">written an interesting article rebutting Lindvall</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://johnleach.co.uk/words/835/the-cost-of-free/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ipq.co: create dns records instantly</title>
		<link>http://johnleach.co.uk/words/646/ipq-co-intstant-dns-records</link>
		<comments>http://johnleach.co.uk/words/646/ipq-co-intstant-dns-records#comments</comments>
		<pubDate>Fri, 10 Sep 2010 21:38:55 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Tech]]></category>
		<category><![CDATA[dns]]></category>
		<category><![CDATA[hostname]]></category>
		<category><![CDATA[instant]]></category>
		<category><![CDATA[ip address]]></category>
		<category><![CDATA[quick]]></category>
		<category><![CDATA[records]]></category>

		<guid isPermaLink="false">http://johnleach.co.uk/words/?p=646</guid>
		<description><![CDATA[ipq.co is a new service I put together to lower the barrier for dns management. It&#8217;s the tinyurl of the dns world &#8211; provide an IP address and you get a random dns record for it (or you can choose &#8230; <a href="http://johnleach.co.uk/words/646/ipq-co-intstant-dns-records">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://ipq.co">ipq.co</a> is a new service I put together to lower the barrier for dns management.  It&#8217;s the <a href="http://tinyurl.com">tinyurl</a> of the dns world &#8211; provide an IP address and you get a random dns record for it (or you can choose your own, if it&#8217;s available).  Looking at other dns management systems, I was surprised this hadn&#8217;t been done before (and by how awful most of the dns interfaces are out there!)</p>
<p>I wrote it in Ruby using the <a href="http://rubyonrails.org/">Rails 3</a> framework, with the dns records being served by the <a href="http://www.powerdns.com/">PowerDNS</a> MySQL back end (though I&#8217;ll likely be switching it to use a custom back end using my <a href="http://github.com/johnl/powerdns_pipe">powerdns_pipe library</a> for more flexibility).</p>
<p>We&#8217;re building a big new cloud system over at <a href="http://www.brightbox.co.uk">Brightbox</a> and we&#8217;ve been thinking how to provide convenient dns records for our customers.  We already have some basic integration but the resulting records are quite a mouthful. ipq.co is just a bit of an experiment to explore other ways of solving the problem.  There has already been <a href="http://news.ycombinator.com/item?id=1678324">some discussion over on Hacker News</a> about possible applications (and implications) of the service &#8211; I&#8217;m interesting in how people will use it.</p>
<p>I&#8217;ve got some plans for other features which I&#8217;ll be adding over the next few weeks, and then I&#8217;ll be selling it to Google for low 7 figures, so watch this space.</p>
<p>UPDATE: Some ipq.co records were used to point at some phising sites and Google <a href="http://www.google.com/safebrowsing/diagnostic?site=http://ipq.co/">blacklisted the entire site</a>.  I&#8217;ve requested a delisting with Google but that might take some time.  Any thoughts on how to avoid this in future?  I&#8217;m thinking check the IP with some well established  banlists on create (and possibly check them all regularly after that too).</p>
]]></content:encoded>
			<wfw:commentRss>http://johnleach.co.uk/words/646/ipq-co-intstant-dns-records/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>LVM snapshot performance</title>
		<link>http://johnleach.co.uk/words/613/lvm-snapshot-performance</link>
		<comments>http://johnleach.co.uk/words/613/lvm-snapshot-performance#comments</comments>
		<pubDate>Fri, 18 Jun 2010 22:57:41 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[GNU/Linux]]></category>
		<category><![CDATA[Tech]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[btrfs]]></category>
		<category><![CDATA[device-mapper]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[lvm]]></category>
		<category><![CDATA[snapshot]]></category>
		<category><![CDATA[speed]]></category>
		<category><![CDATA[storage]]></category>
		<category><![CDATA[zfs]]></category>

		<guid isPermaLink="false">http://johnleach.co.uk/words/?p=613</guid>
		<description><![CDATA[The Linux Logical Volume Manager (LVM) supports creating snapshots of logical volumes (LV) using the device mapper. Device mapper implements snapshots using a copy on write system, so whenever you write to either the source LV or the new snapshot &#8230; <a href="http://johnleach.co.uk/words/613/lvm-snapshot-performance">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)">Linux Logical Volume Manager</a> (LVM) supports creating snapshots of logical volumes (LV) using the device mapper. Device mapper implements snapshots using a copy on write system, so whenever you write to either the source LV or the new snapshot LV, a copy is made first.</p>
<p>So a write to a normal LV is just a write, but a write to a snapshotted LV (or an LV snapshot) involves reading the original data, writing it elsewhere and then writing some metadata about it all.</p>
<p>This quite obviously impacts performance, and due to device mapper having a very basic implementation, it is particularly bad.  My tests show <em>synchronous sequential writes to a snapshotted LV are around 90% slower than writes to a normal LV</em>.</p>
<p><span id="more-613"></span><br />
Once copied and written, writes to the same chunk are only 15% slower.  <span style="font-size: 13.3333px;">Reads are super fast, only a 5% speed impact.</span></p>
<p>Still, not many usage patterns involve huge full speed sequential writes to a filesystem, so LVM is still useful in most circumstances.</p>
<p>I did some tests to see how writes to one snapshotted LV impacted the performance of writes to a completely separate normal LV. Does a snapshotted LV ruin the performance of all your other LVs? Yes, especially if you&#8217;re using the cfq disk scheduler. Switching to the deadline scheduler made things considerably better for the normal LV (but slowed writes to the snapshotted LV a little further).</p>
<p>I did these tests on a 12 disk hardware RAID10 system. The test is a synthetic benchmark so I urge you to do your own tests, but it&#8217;s safe to say that <em>device mapper does not implement clever snapshotting like btrfs or zfs &#8211; don&#8217;t expect great performance from it.</em></p>
<h3>Improving LVM Snapshot performance</h3>
<p>There are a few ways to improve performance of LVM snapshots.  The most obvious one is the chunk size, which can be tweaked when creating the snapshot.  This controls the size of the data that will be copied and written on write operations.  The best setting will depend on lots of stuff, such as your RAID stripe size and your usage patterns.</p>
<p>There is an <a href="http://lkml.org/lkml/2008/9/17/40">as-yet uncommitted patch</a> that improves snapshot write performance a bit by being a bit clever about the disk queuing, but it&#8217;s still slow.</p>
<p>Also, device mapper supports non-persistent snapshots (i.e: lost after reboot), which should avoid having to write the change metadata to disk (which will save a lot of seeks and writes) but LVM doesn&#8217;t seem to support creating these yet.</p>
<p>Putting the snapshot device on a separate disk would help too &#8211; I&#8217;m not sure it&#8217;s possible with LVM, but device mapper does support it.</p>
]]></content:encoded>
			<wfw:commentRss>http://johnleach.co.uk/words/613/lvm-snapshot-performance/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Testing XML with rspec, xpath and libxml</title>
		<link>http://johnleach.co.uk/words/585/testing-xml-with-rspec-xpath-and-libxml</link>
		<comments>http://johnleach.co.uk/words/585/testing-xml-with-rspec-xpath-and-libxml#comments</comments>
		<pubDate>Tue, 06 Apr 2010 12:28:46 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Tech]]></category>
		<category><![CDATA[rspec]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[spec]]></category>
		<category><![CDATA[tdd]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[xpath]]></category>

		<guid isPermaLink="false">http://johnleach.co.uk/words/?p=585</guid>
		<description><![CDATA[I&#8217;m currently working with the virtualization API libvirt which uses XML to represent virtual machines and I&#8217;m generating this XML using Ruby.  I&#8217;m using rspec to test my code and wanted to test that my output was as I expected. &#8230; <a href="http://johnleach.co.uk/words/585/testing-xml-with-rspec-xpath-and-libxml">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m currently working with the virtualization API<a href="http://libvirt.org/"> libvirt</a> which uses XML to represent virtual machines and I&#8217;m generating this XML using Ruby.  I&#8217;m using <a href="http://rspec.info/">rspec</a> to test my code and wanted to test that my output was as I expected.  I started out with <a href="http://github.com/fnando/rspec-hpricot-matchers">rspec-hpricot-matchers</a> which worked fine until I started testing slightly more complex xml, which hpricot wasn&#8217;t handling well.</p>
<p>So I wrote a have_xml matcher using the rspec dsl which uses the <a href="http://libxml.rubyforge.org/">libxml</a> library to do the testing.  It&#8217;s so simple it&#8217;s not really worthy of a gem, so here it is (licensed under public domain).  The text check is optional and, to be honest, <a href="http://blog.thecodewhisperer.com/post/398226883/rspec-have-tag-spec-matcher-and-nokogiri">doesn&#8217;t belong here really</a>.  It should be a separate matcher.</p>
<pre><code>
require 'libxml'

Spec::Matchers.define :have_xml do |xpath, text|
  match do |body|
    parser = LibXML::XML::Parser.string body
    doc = parser.parse
    nodes = doc.find(xpath)
    nodes.empty?.should be_false
    if text
      nodes.each do |node|
        node.content.should == text
      end
    end
    true
  end

  failure_message_for_should do |body|
    "expected to find xml tag #{xpath} in:\n#{body}"
  end

  failure_message_for_should_not do |response|
    "expected not to find xml tag #{xpath} in:\n#{body}"
  end

  description do
    "have xml tag #{xpath}"
  end
end
</code></pre>
<p>So, add that somewhere (usually spec/spec_helper.rb) and use it like this:</p>
<pre><code>
it "should include the xen_machine_id" do
  @xml.should have_xml('/domain/name', 'bb-example-001')
end

it "should include the network devices" do
  @xml.should have_xml "/domain/devices/interface[1]/ip[@address='1.2.3.4']"
  @xml.should have_xml "/domain/devices/interface[1]/mac[@address='aa:00:01:02:03:04']"
  @xml.should have_xml "/domain/devices/interface[1]/script[@path='/etc/xen/scripts/vif-bridge']"
  @xml.should have_xml "/domain/devices/interface[1]/source[@bridge='inetbr']"
end
</code></pre>
]]></content:encoded>
			<wfw:commentRss>http://johnleach.co.uk/words/585/testing-xml-with-rspec-xpath-and-libxml/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Chat Roulette: Eye Vagina</title>
		<link>http://johnleach.co.uk/words/561/chat-roulette-eye-vagina</link>
		<comments>http://johnleach.co.uk/words/561/chat-roulette-eye-vagina#comments</comments>
		<pubDate>Wed, 10 Mar 2010 21:46:52 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Personal]]></category>
		<category><![CDATA[chat roulette]]></category>
		<category><![CDATA[chatroulette]]></category>
		<category><![CDATA[eye]]></category>
		<category><![CDATA[flash]]></category>
		<category><![CDATA[prank]]></category>
		<category><![CDATA[vagina]]></category>

		<guid isPermaLink="false">http://johnleach.co.uk/words/?p=561</guid>
		<description><![CDATA[Chat Roulette is a web site that hooks you up to a random person. It streams their webcam video and audio to you, and your&#8217;s to them.  When you&#8217;re done, you click next and get another random person. That&#8217;s the &#8230; <a href="http://johnleach.co.uk/words/561/chat-roulette-eye-vagina">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.chatroulette.com/">Chat Roulette</a> is a web site that hooks you up to a random person. It streams their webcam video and audio to you, and your&#8217;s to them.  When you&#8217;re done, you click next and get another random person. That&#8217;s the whole thing.  It&#8217;s fun, for a short period of time.</p>
<p>Anyway, whilst holding my webcam to different parts of my body (if you ever use my webcam, wash your hands) I discovered that my eye, on its side, with the right lighting, and right shadows, and bad focus, through a webcam&#8230; looks kinda, possibly, a bit like girl bits.</p>
<p>It&#8217;s probably fair to say that, for a large proportion of the random strangers on Chat Roulette, the &#8220;Next&#8221; button is usually clicked in the hope of seeing a girl flashing some part of her body.</p>
<p>Combine these two seemingly unconnected facts together, and you get some of the reactions you see in my Eye Vagina video!  The music is &#8220;My Vagina&#8221; by NOFX. I edited out roughly 300 people jerking off.  The vid has had more than half a million hits on you tube. I&#8217;m expecting my share of their fat advertising profits any day now.</p>
<p>I recorded it using ﻿<a href="http://recordmydesktop.sourceforge.net">recordmydesktop</a> and edited it using <a href="http://www.pitivi.org/">Pitivi</a> (which actually had some very annoying audo sync problems I had to jump through hoops to avoid, which was a shame).</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="480" height="385" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/Bq6xjTyw7zM&amp;hl=en_GB&amp;fs=1&amp;rel=0" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="480" height="385" src="http://www.youtube.com/v/Bq6xjTyw7zM&amp;hl=en_GB&amp;fs=1&amp;rel=0" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://johnleach.co.uk/words/561/chat-roulette-eye-vagina/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Advertising and ad blocking</title>
		<link>http://johnleach.co.uk/words/497/advertising-and-ad-blocking</link>
		<comments>http://johnleach.co.uk/words/497/advertising-and-ad-blocking#comments</comments>
		<pubDate>Sun, 07 Mar 2010 16:19:10 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Tech]]></category>
		<category><![CDATA[ad-blocking]]></category>
		<category><![CDATA[adblock]]></category>
		<category><![CDATA[adblocking]]></category>
		<category><![CDATA[advertising]]></category>
		<category><![CDATA[externalities]]></category>
		<category><![CDATA[money]]></category>
		<category><![CDATA[pollution]]></category>

		<guid isPermaLink="false">http://johnleach.co.uk/words/?p=497</guid>
		<description><![CDATA[I&#8217;ve thought about advertising and ad-blockers a lot over the years, and the debate is getting some attention right now starting with a recent Ars Technica article, so I thought I&#8217;d put down some of my own thoughts on it. &#8230; <a href="http://johnleach.co.uk/words/497/advertising-and-ad-blocking">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve thought about advertising and ad-blockers a lot over the years, and the debate is getting some attention right now starting with a recent <a href="http://arstechnica.com/business/news/2010/03/why-ad-blocking-is-devastating-to-the-sites-you-love.ars">Ars Technica article</a>, so I thought I&#8217;d put down some of my own thoughts on it.</p>
<p>Funding your content through advertising is hugely inefficient. Of the people who visit your site, usually only a tiny proportion click on (or notice) an advert, and only a tiny proportion of those then spends any money.  So a tiny, tiny proportion of your visitors give any money to your advertisers. So money filters down this system in tiny margins.  Then, at the bottom of the system, a tiny amount of the profits from the income covers the cost of advertising.  Then this money moves back up the system to you, usually via your advertising agent who takes a nice cut (I&#8217;ve heard Google pass as little as one twelfth onto the publisher in some cases).</p>
<p>And this doesn&#8217;t consider the costs of the advertiser choosing and designing the ad or the tonnes of bandwidth and gatrillions of CPU cycles used to serve the actual adverts.</p>
<p>It also does not consider externalities, such as pollution. <strong>Advertising is mind pollution.</strong> Advertising is designed to affect the behaviour of people for the benefit of the advertiser.  Why would anyone willingly expose themselves to something designed to steal their attention?</p>
<p>You might argue that advertising creates value &#8211; some viewers choose to buy when otherwise they wouldn&#8217;t have. But what of the huge proportion of people who just had their attention stolen? No value was created there.</p>
<p>Because not everyone is suckered in by it, advertising squanders billions of hours of attention every day to produce nothing.<br />
<span id="more-497"></span></p>
<h3>Begging</h3>
<p>Walking down a street in town I might get approached by a beggar who needs money to eat.  In the UK we have the <a href="http://www.statutelaw.gov.uk/content.aspx?LegType=All+Primary&amp;PageNumber=94&amp;NavFrom=2&amp;parentActiveTextDocId=1029462&amp;ActiveTextDocId=1029462&amp;filesize=54565">Vagrancy Act of 1824</a> which prevents these people from <em>harassing</em> me. They can be punished by detention, hard labour and whipping apparently.</p>
<p>Advertising is just a company harassing me for money to eat.  They&#8217;re just better funded.  I believe we should be able to detain and whip their advertising departments.</p>
<h3>Unobtrusive payment</h3>
<p>But seriously, advertising is a broken method of paying for stuff.  If we could unobtrusively pay for content on the Internet, I&#8217;m sure enough people would do so to more than cover costs of production.</p>
<p>We need good, unobtrusive payment methods and a change in culture to pay for good content would follow.  We can work on changing the culture now though: support sites that are ad-free (or have ad-free subscriptions).  I pay for a couple of ad-free subscriptions myself &#8211; be the change you want to see in the world.</p>
<p>I look forward to the day when a site with advertising is a clear signal that nobody would pay for it otherwise. My ad-blocker could then just block the whole site to save my wasting my attention.</p>
<h3>Full Disclosure</h3>
<p>My company does a (very small) amount of advertising, though I&#8217;m not involved directly in it. I don&#8217;t know how well it performs. We also sponsor conferences and events, which is of course advertising.  I also help run a couple of web sites that make money from advertising.</p>
<p>From a viewer&#8217;s perspective, I hate advertising. From a publisher&#8217;s perspective, I can make a few quid from it and kinda just hope the pollution isn&#8217;t so bad (we do not allow annoying flashing adverts, and block ads from particularly evil corporations whenever we can but frankly, I&#8217;m mostly getting by on cognitive dissonance).  I&#8217;ve not thought about it until now, but from an advertisers perspective, it&#8217;s of course nice to get new customers (though I&#8217;m not sure of the &#8220;quality&#8221; of the custom we get via advertising &#8211; I&#8217;m now interested in investigating this further).</p>
<p>I&#8217;m in no way dependent on any of my income from advertising, so it&#8217;s hard to speak from these perspectives.</p>
<h3>Update: Poor people</h3>
<p>I&#8217;m basically saying that because adverts are not well targetted, the majority of advert views are wasted. They&#8217;re mind pollution.</p>
<p>But in order for adverts to get more accurate, the ad companies need to collect personal information about us: what we do online, what we like etc.  So we&#8217;re supposed to hand over our privacy, just so we can ethically view &#8220;free&#8221; stuff on the Internet?</p>
<p>Let&#8217;s suppose advertising becomes perfectly targeted. Every advert you see is something you really can&#8217;t do without and something you can afford. Wouldn&#8217;t this mean you buy everything you get shown? Wouldn&#8217;t this mean you&#8217;d run out of money?</p>
<p>Is it unethical for poor people to view ad-supported online content if they can&#8217;t afford anything being advertised? However well targeted the ads are, they have no money to spend so it&#8217;s completely fruitless.  Perhaps ad supported websites should ban public library Internet addresses &#8211; poor people are reading for free!</p>
<h3>Discussions Elsewhere</h3>
<ul>
<li><a href="http://blog.mozilla.com/rob-sayre/2010/03/06/why-ad-blockers-work/">Rob Sayre: Why adblockers work</a></li>
<li><a href="http://briancarper.net/blog/advertising-is-devastating-to-my-well-being">Brian Carpet: Advertising is devastating to my well being</a></li>
<li><a href="http://news.ycombinator.com/item?id=1173582">Comments on Hacker News</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://johnleach.co.uk/words/497/advertising-and-ad-blocking/feed</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>Old sycamore tree</title>
		<link>http://johnleach.co.uk/words/889/old-sycamore-tree</link>
		<comments>http://johnleach.co.uk/words/889/old-sycamore-tree#comments</comments>
		<pubDate>Sat, 13 Feb 2010 01:11:54 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Photography]]></category>
		<category><![CDATA[bradford]]></category>
		<category><![CDATA[greengates]]></category>
		<category><![CDATA[sycamore]]></category>
		<category><![CDATA[tree]]></category>
		<category><![CDATA[tree surgeon]]></category>

		<guid isPermaLink="false">http://johnleach.co.uk/words/?p=889</guid>
		<description><![CDATA[The sycamore tree in the garden of our new house had to come down after some of it&#8217;s huge branches started coming down in high winds.]]></description>
			<content:encoded><![CDATA[<p>The sycamore tree in the garden of our new house had to come down after some of it&#8217;s huge branches started coming down in high winds.</p>

<div class="ngg-galleryoverview" id="ngg-gallery-5-889">


	
	<!-- Thumbnails -->
		
	<div id="ngg-image-123" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/tree-surgeon/DSC_1082.JPG" title=" " class="thickbox" rel="set_5" >
								<img title="Tree surgeon" alt="Tree surgeon" src="http://johnleach.co.uk/words/wp-content/photos/tree-surgeon/thumbs/thumbs_DSC_1082.JPG" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-125" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/tree-surgeon/DSC_1093.JPG" title=" " class="thickbox" rel="set_5" >
								<img title="Tree surgeon" alt="Tree surgeon" src="http://johnleach.co.uk/words/wp-content/photos/tree-surgeon/thumbs/thumbs_DSC_1093.JPG" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-127" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/tree-surgeon/DSC_1095.JPG" title=" " class="thickbox" rel="set_5" >
								<img title="Tree surgeon" alt="Tree surgeon" src="http://johnleach.co.uk/words/wp-content/photos/tree-surgeon/thumbs/thumbs_DSC_1095.JPG" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-129" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/tree-surgeon/DSC_1107.JPG" title=" " class="thickbox" rel="set_5" >
								<img title="Tree surgeon" alt="Tree surgeon" src="http://johnleach.co.uk/words/wp-content/photos/tree-surgeon/thumbs/thumbs_DSC_1107.JPG" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-131" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/tree-surgeon/DSC_1112.JPG" title=" " class="thickbox" rel="set_5" >
								<img title="Tree surgeon" alt="Tree surgeon" src="http://johnleach.co.uk/words/wp-content/photos/tree-surgeon/thumbs/thumbs_DSC_1112.JPG" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-133" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/tree-surgeon/DSC_1118.JPG" title=" " class="thickbox" rel="set_5" >
								<img title="Tree surgeon" alt="Tree surgeon" src="http://johnleach.co.uk/words/wp-content/photos/tree-surgeon/thumbs/thumbs_DSC_1118.JPG" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-135" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/tree-surgeon/DSC_1123.JPG" title=" " class="thickbox" rel="set_5" >
								<img title="Tree surgeon's dog" alt="Tree surgeon's dog" src="http://johnleach.co.uk/words/wp-content/photos/tree-surgeon/thumbs/thumbs_DSC_1123.JPG" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-137" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://johnleach.co.uk/words/wp-content/photos/tree-surgeon/DSC_1127.JPG" title=" " class="thickbox" rel="set_5" >
								<img title="Tree surgeon's dog" alt="Tree surgeon's dog" src="http://johnleach.co.uk/words/wp-content/photos/tree-surgeon/thumbs/thumbs_DSC_1127.JPG" width="100" height="75" />
							</a>
		</div>
	</div>
	
		
 	 	
	<!-- Pagination -->
 	<div class="ngg-clear"></div> 	
</div>


]]></content:encoded>
			<wfw:commentRss>http://johnleach.co.uk/words/889/old-sycamore-tree/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Xapian Fu: Full Text Indexing in Ruby</title>
		<link>http://johnleach.co.uk/words/445/xapian-fu-full-text-indexing-in-ruby</link>
		<comments>http://johnleach.co.uk/words/445/xapian-fu-full-text-indexing-in-ruby#comments</comments>
		<pubDate>Sun, 31 Jan 2010 23:28:48 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Tech]]></category>
		<category><![CDATA[active record]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[full text indexing]]></category>
		<category><![CDATA[indexing]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[stemming]]></category>
		<category><![CDATA[stopping]]></category>
		<category><![CDATA[the ruby way]]></category>
		<category><![CDATA[xapian]]></category>
		<category><![CDATA[xapian-fu]]></category>

		<guid isPermaLink="false">http://johnleach.co.uk/words/?p=445</guid>
		<description><![CDATA[Xapian is an Open Source Search Engine Library written in C++. It has Ruby bindings, but they&#8217;re generated with SWIG, so they basically just mirror the C++ bindings &#8211; not very Ruby-like (and pretty ugly). Being a self-confessed full text &#8230; <a href="http://johnleach.co.uk/words/445/xapian-fu-full-text-indexing-in-ruby">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://xapian.org/">Xapian</a> is an Open Source Search Engine Library written in C++.  It has <a href="http://xapian.org/docs/bindings/ruby/">Ruby bindings</a>, but they&#8217;re generated with <a href="http://www.swig.org/">SWIG</a>, so they basically just mirror the C++ bindings &#8211; not very Ruby-like (and pretty ugly).</p>
<p>Being a self-confessed full text indexing nerd and a Ruby-lover, I wrote <a href="http://github.com/johnl/xapian-fu">Xapian Fu</a>: a library to provide access to Xapian that is more in line with &#8220;The Ruby Way&#8221;.</p>
<p>I started writing Xapian Fu exactly a year ago today but left it for a couple of months, then restarted work on it on the train on the way back from the 2009 <a href="http://scottishrubyconference.com/">Scotland on Rails</a> conference.  Development was <a href="http://en.wikipedia.org/wiki/Test-driven_development">test driven</a>, so it&#8217;s got an extensive test suite (using <a href="http://rspec.info/">rspec</a>).  <a href="http://rdoc.info/projects/johnl/xapian-fu">Documentation is in rdoc</a> and is quite detailed.  As of the latest version, it supports Ruby 1.9 too.</p>
<p>Xapian Fu basically gives you a Hash interface to Xapian &#8211; so <em>you get a persistent Hash with full text indexin</em><em>g built in</em> (and ACID transactions!).</p>
<h3>Example</h3>
<p>For example, create a database called example.db, put three documents into it and search them and print the results:</p>
<pre><code>  require 'xapian-fu'
  include XapianFu
  db = XapianDb.new(:dir =&gt; 'example.db', :create =&gt; true,
                    :store =&gt; [:title, :year])
  db &lt;&lt; { :title =&gt; 'Brokeback Mountain', :year =&gt; 2005 }
  db &lt;&lt; { :title =&gt; 'Cold Mountain', :year =&gt; 2004 }
  db &lt;&lt; { :title =&gt; 'Yes Man', :year =&gt; 2008 }
  db.flush
  db.search("mountain").each do |match|
    puts match.values[:title]
  end</code></pre>
<p>There are of course a whole bunch more examples in <a href="http://rdoc.info/projects/johnl/xapian-fu">the documentation</a>.<br />
<span id="more-445"></span></p>
<h3>Schema-less</h3>
<p>The hard work of full text indexing and storage is of course done by the Xapian library, but I have added a couple of useful features.  One in particular is the ability to use symbols (or strings) as field names. Xapian has no real concept of fields, but you can store arbitrary data that it calls values in a numbered slot alongside each document.  Instead of making you deal with field numbers, Xapian Fu uses a hash function to convert your field names into numbers.  This means <em>Xapian Fu is schema-less - </em>you can add or omit fields whenever you like.  It&#8217;s useful to define fields when opening databases so that Xapian Fu can recognise them in searches or to give Xapian Fu some clues on the type of data you&#8217;ll be using, but it&#8217;s not necessary.</p>
<h3>Efficient storage of fields for ordering</h3>
<p>If you tell Xapian Fu what type of data you&#8217;ll be storing in your fields, it can store them more efficiently.  For example, if you don&#8217;t specify the type, integers will be converted to strings as is, so 354,441,945,266,899 becomes &#8220;354441945266899&#8243; &#8211; that&#8217;s  fifteen bytes!  When you tell Xapian Fu that your field is going to be an Integer,  it will store them in double precision floating point format which is 8 bytes and can represent up to about 16 decimal digits.  Also, it&#8217;s stored in big-endian format, so Xapian can still use the field for ordering results. XapianFu will store Time objects like this too, so again, it&#8217;s size efficient and can be used for ordering results.</p>
<p>Since Xapian Fu now knows what type the field is, it can convert it back when you access it too, so you get an Integer or a Time object (rather than a String, which is how Xapian represents it internally). It currently supports Integer, Fixnum, Bignum (up to a certain size), Float, Time and Date.  You can add your own types easily by decorating your instances with special methods.</p>
<h3>Stemming and stopping</h3>
<p>Xapian has <a href="http://xapian.org/docs/stemming.html">stemming</a> support for loads of languages (via the <a href="http://snowball.tartarus.org">Snowball</a> library), but no <a href="http://en.wikipedia.org/wiki/Stop_words">stop word</a> lists.  Xapian Fu uses the appropriate stemmer when you specify a language for your document or database and comes with stop word lists for 13 languages (also automatically used).  This means Xapian doesn&#8217;t have to index these common stop words, so you get faster indexing and search times, a smaller database and more relevant search results.</p>
<p>Xapian Fu also knows that your searches won&#8217;t work right unless you stem them too! It automatically stems queries using the database language (though this does fall down a bit if you have different documents with different languages in your database at the moment, but it can be disabled (and isn&#8217;t too difficult to add support)).</p>
<h3>Will Paginate support</h3>
<p><a href="http://wiki.github.com/mislav/will_paginate/">will_paginate</a> is a pagination library for ActiveRecord (and other DB abstraction layers).  It has helpers for drawing page list interfaces.  Xapian Fu supports will_paginate by using the same method names in result sets (such as :current_page and :total_pages).  You can pass a Xapian Fu result set to the will_paginate helpers and you&#8217;ll get lovely page list interfaces (you need to handle accepting the parameters in your action and setting up the next search of course!)</p>
<h3>Active Record support</h3>
<p>Xapian Fu does not yet have an Active Record plugin (I&#8217;ll add one soon) but as Xapian Fu uses the :id field as the Xapian primary key by default, it&#8217;s trivial to use it in your Rails app right now. See the &#8220;ActiveRecord Integration&#8221; section of the README for code examples.  In this case, you probably don&#8217;t need to actually store any data in the Xapian database, just the index information (and the :id field of course, but that&#8217;s stored by default) &#8211; so you get a smaller database (though you still need to store fields that you want to group by (collapse) or order results with).</p>
<h3>Multi-master replicated full text indexing service</h3>
<p>Xapian Fu doesn&#8217;t do this. I&#8217;m designing something that might though :)</p>
<h3>Getting Xapian Fu</h3>
<p>It&#8217;s available in gem form from Rubyforge/cutter.  The code is <a href="http://github.com/johnl/xapian-fu">on github here</a>.  You&#8217;ll need the Xapian Ruby bindings installed &#8211; on Debian/Ubuntu this is the ﻿﻿﻿﻿libxapian-ruby1.8 package.  The gem named <a href="http://gemcutter.org/gems/xapian">xapian</a> claims to provide Xapian and the Ruby bindings but it failed form me on 64bit.  The gem named <a href="http://gemcutter.org/gems/xapian-full">xapian-full</a> claims to provide the Ruby bindings without Xapian (you&#8217;ll obviously need to build and install Xapian yourself) but I&#8217;ve not used that either.  RPMs, source files and other downloads are listed on the <a href="http://xapian.org/download">Xapian downloads page</a>.</p>
<p>There is also this <a href="http://johnleach.co.uk/documents/xapian-fu/">weird kinda splash page</a> I made, in some kind of attempt to host <em>something</em> about Xapian Fu on my own domain. Not really sure what real purpose it serves.</p>
]]></content:encoded>
			<wfw:commentRss>http://johnleach.co.uk/words/445/xapian-fu-full-text-indexing-in-ruby/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 0.444 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2012-05-17 00:04:36 -->

