Tag: search

Full text indexing of syslog messages with Riak

I’ve just released a little tool I wrote called riak-syslog which takes your syslog messages and puts them into a Riak cluster and then lets you search them using Riak’s full text search.

Rather than re-implement the wheel, riak-syslog expects that a syslog daemon will handle receiving syslog messages and will be able to provide them in a specific format – there is documentation on getting this running with rsyslog on Ubuntu.

I’ve used it to gather and store a few hundred gig of syslogs over the last several months on an small internal Riak cluster on Brightbox Cloud and it’s working well (which can’t be said of a similar setup I did with Solr which caved in after a while and needed some fine tuning!)

There is documentation on getting it set up in the README, and some examples of how to conduct searches too.

If you want to play with Riak, you can build a four node cluster spanning two data-centres in five minutes on Brightbox Cloud.

You might also be interested in my post about indexing syslog messages with Solr.

Indexing syslog messages with solr

I’ve been thinking about centralized indexing and searching of logs for a while and the other day I came across a project called Graylog2 that does just that. It provides a service to receive messages over the network (in a couple of formats, including syslog) and writes them into mongodb. It then has a rails application that lets you browse and search the logs.

It’s neat but I wasn’t quite happy with the search options – I’ve always thought logs should be indexed with a real full text indexer. So, I knocked up a couple of scripts to do just that, as a proof of concept.

It uses rsyslog to receive the messages and write them to a named pipe.  A small ruby script called rsyslog-solr reads from the other end of the pipe and writes batches of the incoming messages to the full text indexer. I chose solr as the full text indexer as it has some very good options for scaling up, which will be necessary when indexing lots of logs.

Solr indexes, compresses and stores the messages sent to it, so we can retrieve the full text without having to store the original log. I wrote a custom schema definition optimized for this.

Then another script, rsyslog-solr-search, is used to query Solr and display the matching messages.

Querying is fun, for example I’ve searched all ssh authentication failures across all hosts and then searched on the originating IPs to see what other probes they made.

You don’t have to do advanced searches though, you can just display all logs from the last hour, or day or whatever.

One important note, any user that can generate logs that are sent to the system can cause a denial of service attack by sending specially malformed messages. This can be fixed by moving the formatting of the log entries from rsyslog into the ruby script, but I’ve not done it yet.

I’ve pushed the code to github under the MIT license. Feel free to improve it.

Xapian Fu: Full Text Indexing in Ruby

Xapian is an Open Source Search Engine Library written in C++. It has Ruby bindings, but they’re generated with SWIG, so they basically just mirror the C++ bindings – not very Ruby-like (and pretty ugly).

Being a self-confessed full text indexing nerd and a Ruby-lover, I wrote Xapian Fu: a library to provide access to Xapian that is more in line with “The Ruby Way”.

I started writing Xapian Fu exactly a year ago today but left it for a couple of months, then restarted work on it on the train on the way back from the 2009 Scotland on Rails conference.  Development was test driven, so it’s got an extensive test suite (using rspec).  Documentation is in rdoc and is quite detailed.  As of the latest version, it supports Ruby 1.9 too.

Xapian Fu basically gives you a Hash interface to Xapian – so you get a persistent Hash with full text indexing built in (and ACID transactions!).

Example

For example, create a database called example.db, put three documents into it and search them and print the results:

  require 'xapian-fu'
  include XapianFu
  db = XapianDb.new(:dir => 'example.db', :create => true,
                    :store => [:title, :year])
  db << { :title => 'Brokeback Mountain', :year => 2005 }
  db << { :title => 'Cold Mountain', :year => 2004 }
  db << { :title => 'Yes Man', :year => 2008 }
  db.flush
  db.search("mountain").each do |match|
    puts match.values[:title]
  end

There are of course a whole bunch more examples in the documentation.
(more…)

My NWRUG Ferret Talk

I did a short talk on Ferret, the Ruby “Information Retreival Library”, at the North West Ruby Users Group last Thursday.  We had a bit of a theme too, with Will Jessop speaking about Sphinx and Asa Calow speaking about Solr.

I got to have a bit of a nosey around the Manchester BBC building too – though I was worried I’d open the wrong door and end up on TV. Didn’t fancy having to apologise to Jeremy Paxman.

Brightbox also sponsored some pizza, and gave away t-shirts and stickers like candy (there was no candy though).

My slides are available here, and contain a little example file system indexer. I made my slides with webby and S6 if you’re interested.