Backup System for Apache Kafka ( Part 1 )

Honestly, there aren’t many good backup tools for Apache Kafka but why we need a backup tool in the first place. As Apache Kafka didn’t have any offsite backup solution or DR solutions for it, some tricks can be done but none are 100% reliable.

Example Trick:

create MM1 cluster and it will sync data one-way to another Apache Kafka Cluster Only.

Courtesy: freshmango.com

My project/utility create offsite backups in local & cloud storage. Currently, it supports Any local Mounted Storage /AWS S3.

Project Source Code: 116davinder/apache-kafka-backup-and-restore

here is the sample for input configuration for AWS S3.

{
"BOOTSTRAP_SERVERS": "kafka01:9092,kafka02:9092,kafka03:9092",
"TOPIC_NAMES": ["davinder.test"], # Only 1 Topic is supported
"GROUP_ID": "Kafka-BackUp-Consumer-Group",
"FILESYSTEM_TYPE": "S3",
"FILESYSTEM_BACKUP_DIR": "/tmp/",
"NUMBER_OF_MESSAGE_PER_BACKUP_FILE": 1000,
"RETRY_UPLOAD_SECONDS": 100,
"NUMBER_OF_KAFKA_THREADS": 2,
"LOG_LEVEL": 20
}

How to Run Backup Application?

$ git clone https://github.com/116davinder/apache-kafka-backup-and-restore.git
$ cd apache-kafka-backup-and-restore
$ export AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXXX
$ export AWS_ACCESS_KEY_ID=XXXXXXXXXXXXX
$ python3 backup.py <path_to_backup.json>

Expected Output:

$ python3 backup.py backup.json
{ "@timestamp": "2020-06-10 12:49:43,871","level": "INFO","thread": "S3 Upload","name": "botocore.credentials","message": "Found credentials in environment variables." }
{ "@timestamp": "2020-06-10 12:49:43,912","level": "INFO","thread": "Kafka Consumer 1","name": "root","message": "started polling on davinder.test" }
{ "@timestamp": "2020-06-10 12:49:43,915","level": "INFO","thread": "Kafka Consumer 0","name": "root","message": "started polling on davinder.test" }
{ "@timestamp": "2020-06-10 12:49:43,916","level": "INFO","thread": "Kafka Consumer 2","name": "root","message": "started polling on davinder.test" }
{ "@timestamp": "2020-06-10 12:49:44,307","level": "INFO","thread": "S3 Upload","name": "root","message": "upload successful at s3://davinder-test-kafka-backup/davinder.test/0/20200608-102909.tar.gz" }
{ "@timestamp": "2020-06-10 12:49:45,996","level": "INFO","thread": "S3 Upload","name": "root","message": "waiting for new files to be generated" }
{ "@timestamp": "2020-06-10 12:52:33,130","level": "INFO","thread": "Kafka Consumer 0","name": "root","message": "Created Successful Backupfile /tmp/davinder.test/0/20200610-125233.tar.gz" }
{ "@timestamp": "2020-06-10 12:52:33,155","level": "INFO","thread": "Kafka Consumer 0","name": "root","message": "Created Successful Backup sha256 file of /tmp/davinder.test/0/20200610-125233.tar.gz.sha256" }
....

Now leave this application running if you want to have continuous backups or close it. Now you can put extra policies on S3 as well to migrate data to S3 Glacier.

Notes*

  1. This application can be started in Container/Docker.

Hopefully, someone will not have to create a similar tool for their needs :).

Senior Software Engineer III ( R&D )

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store