<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>John Leach's Blog &#187; ferret</title>
	<atom:link href="http://johnleach.co.uk/words/archives/tag/ferret/feed" rel="self" type="application/rss+xml" />
	<link>http://johnleach.co.uk/words</link>
	<description>Stuff I think, see and do</description>
	<lastBuildDate>Fri, 18 Jun 2010 22:57:41 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>My NWRUG Ferret Talk</title>
		<link>http://johnleach.co.uk/words/archives/2009/03/24/362/my-nwrug-ferret-talk</link>
		<comments>http://johnleach.co.uk/words/archives/2009/03/24/362/my-nwrug-ferret-talk#comments</comments>
		<pubDate>Tue, 24 Mar 2009 17:12:07 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Ruby on Rails]]></category>
		<category><![CDATA[Tech]]></category>
		<category><![CDATA[ferret]]></category>
		<category><![CDATA[indexing]]></category>
		<category><![CDATA[inverse index]]></category>
		<category><![CDATA[rails]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[sphinx]]></category>
		<category><![CDATA[talk]]></category>

		<guid isPermaLink="false">http://johnleach.co.uk/words/?p=362</guid>
		<description><![CDATA[I did a short talk on Ferret, the Ruby &#8220;Information Retreival Library&#8221;, at the North West Ruby Users Group last Thursday.  We had a bit of a theme too, with Will Jessop speaking about Sphinx and Asa Calow speaking about Solr. I got to have a bit of a nosey around the Manchester BBC building [...]]]></description>
			<content:encoded><![CDATA[<p>I did a short talk on <a href="http://ferret.davebalmain.com/">Ferret</a>, the Ruby &#8220;Information Retreival Library&#8221;, at the <a href="http://nwrug.org/events/march09/">North West Ruby Users Group</a> last Thursday.  We had a bit of a theme too, with Will Jessop speaking about Sphinx and Asa Calow speaking about Solr.</p>
<p>I got to have a bit of a nosey around the Manchester BBC building too &#8211; though I was worried I&#8217;d open the wrong door and end up on TV. Didn&#8217;t fancy having to apologise to Jeremy Paxman.</p>
<p><a href="http://www.brightbox.co.uk">Brightbox</a> also sponsored some pizza, and gave away t-shirts and stickers like candy (there was no candy though).</p>
<p>My <a href="http://johnleach.co.uk/documents/talks/090319-ruby-ferret-nwrug/">slides are available here</a>, and contain a little example file system indexer. I made my slides with <a href="http://webby.rubyforge.org/">webby</a> and <a href="http://github.com/geraldb/s6/tree/master">S6</a> if you&#8217;re interested.</p>
]]></content:encoded>
			<wfw:commentRss>http://johnleach.co.uk/words/archives/2009/03/24/362/my-nwrug-ferret-talk/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Segfault in Ruby Ferret query parser</title>
		<link>http://johnleach.co.uk/words/archives/2007/09/13/278/segfault-in-ruby-ferret-query-parser</link>
		<comments>http://johnleach.co.uk/words/archives/2007/09/13/278/segfault-in-ruby-ferret-query-parser#comments</comments>
		<pubDate>Thu, 13 Sep 2007 20:24:28 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Ruby on Rails]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[crash]]></category>
		<category><![CDATA[denial-of-service]]></category>
		<category><![CDATA[dos]]></category>
		<category><![CDATA[ferret]]></category>
		<category><![CDATA[rails]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[segfault]]></category>

		<guid isPermaLink="false">http://johnleach.co.uk/words/archives/2007/09/13/278/</guid>
		<description><![CDATA[Whilst working with the Ruby text search engine library Ferret, I came across a segfault in the query parser. It had already been reported and fixed, but I realised it can lead to a denial of service. If you use Ferret anywhere that allows users to execute queries, those users can crash the Ruby process [...]]]></description>
			<content:encoded><![CDATA[<p>Whilst working with the Ruby text search engine library Ferret, I came across a segfault in the query parser.  It had already <a href="http://ferret.davebalmain.com/trac/ticket/208">been reported</a> and <a href="http://ferret.davebalmain.com/trac/changeset/773">fixed</a>, but I realised it can lead to a denial of service.</p>
<p>If you use Ferret anywhere that allows users to execute queries, those users can crash the Ruby process with a specially crafted query.  This was quite serious for a number of my sites (not to mention slowing development of a current app) so I applied the fix to the released 0.11.4 source and repackaged it as 0.11.4.1.</p>
<p>Obviously this isn&#8217;t in any way official, but it works for me and I&#8217;m sharing here for anyone else affected.  <a href="http://johnleach.co.uk/downloads/ruby/ferret/ferret-0.11.4.1/" title="Ferret 0.11.4.1">Gem, tgz and zip here</a> and just the <a href="http://johnleach.co.uk/downloads/ruby/ferret/ferret-0.11.4-fix-multiterm-segfault.patch" title="Ferret 0.11.4.1 segault fix patch">patch available here</a>  (derived from the author&#8217;s changeset to trunk).</p>
<p>The patch is against the release source, as the subversion repository seems to be down atm (I got the changeset from the web bases subversion viewer).</p>
<p>Get upgrading!</p>
]]></content:encoded>
			<wfw:commentRss>http://johnleach.co.uk/words/archives/2007/09/13/278/segfault-in-ruby-ferret-query-parser/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>News Sniffer, Ferret and Rails</title>
		<link>http://johnleach.co.uk/words/archives/2007/04/20/263/news-sniffer-ferret-and-rails</link>
		<comments>http://johnleach.co.uk/words/archives/2007/04/20/263/news-sniffer-ferret-and-rails#comments</comments>
		<pubDate>Fri, 20 Apr 2007 12:37:41 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Ruby on Rails]]></category>
		<category><![CDATA[ferret]]></category>
		<category><![CDATA[indexing]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[newssniffer]]></category>
		<category><![CDATA[rails]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[searching]]></category>

		<guid isPermaLink="false">http://johnleach.co.uk/words/archives/2007/04/20/263/</guid>
		<description><![CDATA[I&#8217;ve been working on my News Sniffer project for the last few days, finishing up a two month experiment with using the Ruby Lucene implementation, Ferret, to index news articles and comments.  More info on the News Sniffer blog.  The project spanned two months due to some instability in the newer versions of Ferret, but [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been working on my <a href="http://www.newssniffer.co.uk">News Sniffer</a> project for the last few days, finishing up a two month experiment with using the Ruby Lucene implementation, Ferret, to index news articles and comments.  More info on the <a href="http://www.newssniffer.co.uk/blog/2007/04/20/upgrade-wym-comment-cleanup-downtime-and-improved-search/">News Sniffer blog</a>.  The project spanned two months due to some instability in the newer versions of Ferret, but the author responded to the bug reports and managed to fix all the problems so I decided to deploy.</p>
<p>Ferret offers huge improvements over the original MySQL full-text search method, and I&#8217;m looking forward to adding some fancy keyword statistics graphs in the future &#8211; perhaps showing censorship patterns in bbc comments with certain keywords.</p>
<p>Because News Sniffer is distributed across a number of servers, I used DRb (distributed Ruby) to allow them all to update one central Ferret index.  DRb seems to work very well generally, and is amazingly simple to use, but I ran into a few problems with recycled objects and invalid references whilst using Ferret across it, apparently due to the garbage collector on the service side collecting things still in use on the client side.  I think I eliminated most of them but they still crop up once in a while &#8211; I&#8217;ll be looking into this further.</p>
<p><span id="more-263"></span>I also moved from using memcached for cache fragment storage to FileStore.  This allows me to expire fragments using  regular expressions, which lets me use fragment caching more easily and more often (such as with paged listing).  FileStore is rather slower than memcached, especially when expiring using these regular expressions, but being able to use it more often outweighed the performance hit.  FileStore is obviously not distributed unless you have a shared file system, so I used DRb here too.</p>
<p>It would be nice to add regular expression expiry to memcached, but I think this goes against the original design spec for memcached.  I&#8217;m considering adding configurable memory limits to the Rails MemoryStore fragment store, where it&#8217;ll remove least recently used fragments when the limit is approached (currently it would just keep allocating ram until your OS killed your Ruby process).</p>
<p>I also found a (easily fixable on Linux/BSD) race condition in FileStore where you could theoretically retrieve a corrupted fragment when it&#8217;s used in a multi-process shared storage setup (though not a multi-thread setup, so my DRb&#8217;ed FileStore should be safe).</p>
<p>Hopefully, with the improved searching due to Ferret and the  higher performance due to FileStore, News Sniffer will now be more useful.</p>
]]></content:encoded>
			<wfw:commentRss>http://johnleach.co.uk/words/archives/2007/04/20/263/news-sniffer-ferret-and-rails/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Active Resource not in Rails 1.2!</title>
		<link>http://johnleach.co.uk/words/archives/2007/02/05/252/active-resource-not-in-rails-12</link>
		<comments>http://johnleach.co.uk/words/archives/2007/02/05/252/active-resource-not-in-rails-12#comments</comments>
		<pubDate>Mon, 05 Feb 2007 12:31:26 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Ruby on Rails]]></category>
		<category><![CDATA[active resource]]></category>
		<category><![CDATA[activeresource]]></category>
		<category><![CDATA[distributed ruby]]></category>
		<category><![CDATA[drb]]></category>
		<category><![CDATA[ferret]]></category>
		<category><![CDATA[news sniffer]]></category>
		<category><![CDATA[newssniffer]]></category>
		<category><![CDATA[rails]]></category>
		<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://johnleach.co.uk/words/archives/2007/02/05/252/</guid>
		<description><![CDATA[Whilst planning some changes to my News Sniffer project, I thought I&#8217;d have a play with Active Resource. Currently, all the forum and news article downloading and scraping happens on a different machine to the web server. It has a VPN connection to the database and memcache servers, but I&#8217;d like to integrate the Ferret [...]]]></description>
			<content:encoded><![CDATA[<p>Whilst planning some changes to my <a href="http://newssniffer.newworldodour.co.uk" title="News Sniffer" target="_blank">News Sniffer project</a>, I thought I&#8217;d have a play with Active Resource.</p>
<p>Currently, all the forum and news article downloading and scraping happens on a different machine to the web server.  It has a VPN connection to the database and memcache servers, but I&#8217;d like to integrate the <a href="http://http://ferret.davebalmain.com/trac/" target="_blank">Ferret</a> text indexing system for better searching capabilities.  To centralise Ferret, I have a three options:</p>
<ol>
<li>regularly reindex new content from the database on the web server;</li>
<li><a href="http://chadfowler.com/ruby/drb.html" title="Distributed Ruby" target="_blank">DRb</a> a Ferret Object;</li>
<li>or use ActiveResource to access the models via the web service.</li>
</ol>
<p>DRb-ing a Ferret Object would be quite elegant, but using ActiveResource would also replace the need for a database and memcache connection (and I could do much better fragment caching actually).</p>
<p>Anyway, I searched high and low for some docs &#8211; lots of blog entries about how great it is, but no real api docs.  When I searched through the Rails code and found nothing either, I got suspicious.  Finally I found a couple of blog entries stating that <strong>ActiveResource was dropped from Rails 1.2</strong>.  It seems to be planned for Rails 2.0.  Not sure how I missed this.  I guess my search-foo is lacking.</p>
<p>I&#8217;ll be investigating other options.  I&#8217;d much prefer not to build a SOAP or XMLRPC interface. Ugh.</p>
]]></content:encoded>
			<wfw:commentRss>http://johnleach.co.uk/words/archives/2007/02/05/252/active-resource-not-in-rails-12/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
