Member-only story
How to provide failover for Logstash or other log collector using keepalived
When planning the system we take into account possible failures (Design for Failure). In the case of log aggregation we use solutions such as Elasticsearch or Splunk, but also queues like Apache Kafka. It works in a cluster, acts as a buffer and allows the use of many consumers such as Logstash or Fluentd. Sometimes, however, we forget about protecting the collector that powers the queue. In this story I you will show you how to use keepalived to ensure Logstash failover.
Plan
Let’s assume that we collect a syslog (UDP) and throw it on Apache Kafka. Everything seems beautiful and thoughtful. Unfortunately, in case of a collector failure (e.g. a single instance of Logstash), we have a problem.
So we need more than one Logstash. With Beats, there wouldn’t be a problem. We can then set up multiple Logstashes as outputs. In this case of UDP, we will use keepalived, that implements VRRP, on two VMs.
Logs will be sent to dynamic/floating IP. The first machine will be the main one (master), the second one will be the backup one. There may be more backups if necessary. Sometimes…