Indexing syslog messages with solr

I’ve been thinking about centralized indexing and searching of logs for a while and the other day I came across a project called Graylog2 that does just that. It provides a service to receive messages over the network (in a couple of formats, including syslog) and writes them into mongodb. It then has a rails application that lets you browse and search the logs.

It’s neat but I wasn’t quite happy with the search options – I’ve always thought logs should be indexed with a real full text indexer. So, I knocked up a couple of scripts to do just that, as a proof of concept.

It uses rsyslog to receive the messages and write them to a named pipe.  A small ruby script called rsyslog-solr reads from the other end of the pipe and writes batches of the incoming messages to the full text indexer. I chose solr as the full text indexer as it has some very good options for scaling up, which will be necessary when indexing lots of logs.

Solr indexes, compresses and stores the messages sent to it, so we can retrieve the full text without having to store the original log. I wrote a custom schema definition optimized for this.

Then another script, rsyslog-solr-search, is used to query Solr and display the matching messages.

Querying is fun, for example I’ve searched all ssh authentication failures across all hosts and then searched on the originating IPs to see what other probes they made.

You don’t have to do advanced searches though, you can just display all logs from the last hour, or day or whatever.

One important note, any user that can generate logs that are sent to the system can cause a denial of service attack by sending specially malformed messages. This can be fixed by moving the formatting of the log entries from rsyslog into the ruby script, but I’ve not done it yet.

I’ve pushed the code to github under the MIT license. Feel free to improve it.

Comments

[…] vous êtes pressé vous pouvez rendre visite à John Leach, dont le travail peut vous être […]

tgiles says:

I think we both did the same sort of thing after looking at Graylog2, but I ended up with a Python-based listener service injecting logs into a Postgres database.

I’ve been wanting to work more with solr for some time now. Will review what you have and will see where I can expand it. Thanks for the post!

tom

[…] might also be interested in my post about indexing syslog messages with Solr. This entry was posted in Tech and tagged full text, index, riak, search, solr, splunk, syslog. […]

Leave a Reply