Using Kafka Connect and S3 Sink Connector to copy data from Kafka Topic to S3 for archival

2 min readDec 5, 2019

Kafka Connect Worker along with the S3 Sink Connector provide out of the box functionality of copying data out of Kafka topics into S3. But If your Kafka Broker requires SSL authentication, configuring your Kafka connect worker can be a little trickier than you expect. Here is how to create a custom docker image with configuration for SSL communication with your Kafka broker, so that this Kafka Connect worker can be deployed to any Docker orchestration platform such as OCP(OpenShift Container Platform).

Steps involved
Create a custom docker image for your Kafka Connect worker, using cp-kafka-connect-base as your base image. The base image does not bundle all the connectors, and allows you to add connectors that you are interested in.

In the Kafka Broker, create Kafka Topics for storing config, offset and status. These topics are going to be used by your Kafka Connect Worker. Setup ACLs on the topics to allow Read/Write/Describe from hosts with CN, for example, xyz.com.

In the dockerfile, specify SSL settings that will be used by the connect worker, connect consumer(s3 sink connector) and the adminclient. These settings are key to SSL communication with the Kafka Broker.

The dockerfile would look like below.

Create secrets containing the keystore and truststore above. Use volume mounts to make the keystore and truststore available at /mnt/keystore/ and /mnt/truststore/ as shown in the dockerfile.

Also make sure the keystore, truststore passwords and AWS credentials for the connector to write to S3, are setup as secrets and made available as environment variables.

“AWS_ACCESS_KEY_ID”: “xxx”,
 “AWS_SECRET_ACCESS_KEY”: “xxx”,
 “CONNECT_SSL_KEYSTORE_PASSWORD”: “xxx”,
 “CONNECT_SSL_KEY_PASSWORD”: “xxx”,
 “CONNECT_CONSUMER_SSL_KEYSTORE_PASSWORD”: “xxx”,
 “CONNECT_CONSUMER_SSL_KEY_PASSWORD”: “xxx”,
 “SSL_KEYSTORE_PASSWORD”: “xxx”,
 “SSL_KEY_PASSWORD”: “xxx”

Running the above docker image along with the certs mounted and the secret environment variables, will launch Kafka Connect Worker. To launch the S3 Sink Connector job on the Kafka connect worker, the easiest way to do this is to connect to the running docker instance, and run command below.

curl -X POST  -H 'Content-Type: application/json' --data '{"name": "my-s3-sink", "config": {"connector.class":"io.confluent.connect.s3.S3SinkConnector", "tasks.max":"1", "topics":"mysource.topic", "s3.region":"us-east-1", "s3.bucket.name": "com.xyz.mydestination.s3.bucket", "s3.part.size":"5242880", "flush.size":"1", "storage.class":"io.confluent.connect.s3.storage.S3Storage", "format.class": "io.confluent.connect.s3.format.json.JsonFormat", "partitioner.class":"io.confluent.connect.storage.partitioner.DefaultPartitioner", "schema.compatibility":"NONE"}}'  http://localhost:8083/connectors

The S3 sink connector job launched above will also survive pod restarts because this configuration is saved by the Kafka Connect Worker in the config topic on the Kafka broker.

Note: You may need more than 2GB to run Kafka connect worker. Otherwise, you will risk running into the dreaded OOM Killed error code 137.

Using Kafka Connect and S3 Sink Connector to copy data from Kafka Topic to S3 for archival

Written by Apollo Software Labs

No responses yet