I’ve been thinking about centralized indexing and searching of logs for a while and the other day I came across a project called Graylog2 that does just that. It provides a service to receive messages over the network (in a couple of formats, including syslog) and writes them into mongodb. It then has a rails application that lets you browse and search the logs.
It’s neat but I wasn’t quite happy with the search options – I’ve always thought logs should be indexed with a real full text indexer. So, I knocked up a couple of scripts to do just that, as a proof of concept.
It uses rsyslog to receive the messages and write them to a named pipe. A small ruby script called rsyslog-solr reads from the other end of the pipe and writes batches of the incoming messages to the full text indexer. I chose solr as the full text indexer as it has some very good options for scaling up, which will be necessary when indexing lots of logs.
Solr indexes, compresses and stores the messages sent to it, so we can retrieve the full text without having to store the original log. I wrote a custom schema definition optimized for this.
Then another script, rsyslog-solr-search, is used to query Solr and display the matching messages.
Querying is fun, for example I’ve searched all ssh authentication failures across all hosts and then searched on the originating IPs to see what other probes they made.
You don’t have to do advanced searches though, you can just display all logs from the last hour, or day or whatever.
One important note, any user that can generate logs that are sent to the system can cause a denial of service attack by sending specially malformed messages. This can be fixed by moving the formatting of the log entries from rsyslog into the ruby script, but I’ve not done it yet.
I’ve pushed the code to github under the MIT license. Feel free to improve it.