Performance tuning — Apache Kafka

Performance tuning — Apache Kafka

Running a self managed Apache Kafka cluster could be a daunting task, once you move beyond basic hello world style setup and start moving serious heavy volume of data.

Here are some of the Kafka cluster optimization which has been gathered across the internet.

Not everything is possible

You can’t get everything. Kafka performance tuning requires managing opposing goals between latency and throughput basically choosing between how fast you need the messages to be consumed vs how much volume of the messages.

These principals are applicable across all distributed systems including DynamoDB, Elasticsearch, in fact the concept is called CAP theorem, , which balances consistency , availability, and partition tolerance.

Partitions

Partitions play a crucial role in determining the parallelism of consumers. Increasing the number of partitions allows for the addition of more consumers, leading to enhanced throughput. It is advisable to determine the number of partitions based on the performance capabilities of your consumer and the required consumption rate. For instance, if your consumer can handle 1,000 events per second (EPS) and you need to achieve a consumption rate of 5,000 EPS, it is recommended to opt for 5 partitions.

Replications

Optimal Kafka replication settings depend on various factors, including your specific use case, infrastructure, and performance requirements. However, I can provide some general recommendations for configuring Kafka replication:

Replication Factor : Set a replication factor that ensures fault tolerance. A replication factor of 3 is commonly used, allowing for one broker to fail without data loss.

replication.factor=3

Min In-Sync Replicas: Configure `min.insync.replicas` to a value greater than 1. This ensures that a minimum number of in-sync replicas are available before acknowledging a write.

min.insync.replicas=2

Unclean Leader Election: Disable unclean leader election to prevent a replica with out-of-date data from becoming a leader. This helps maintain data consistency.

unclean.leader.election.enable=false

Replica Lag Time: Adjust `replica.lag.time.max.ms` to control the maximum time a replica can lag behind the leader. This can help prevent slow or lagging replicas from affecting overall cluster performance.

replica.lag.time.max.ms=30000

JVM Settings

Heap Settings: Set the initial and maximum heap size to the same value to prevent the JVM from resizing the heap dynamically.

-Xms8G-Xmx8G

Garbage Collection: Use the G1 Garbage Collector, which is known for better performance in terms of latency and throughput. Add the following flags

-XX:+UseG1GC

Thread Stack Size: Adjust the thread stack size if necessary

-Xss512k

JVM Performance Options: Optimize for server-class machines:

 -server

Set the JVM to aggressive optimization

 -XX:+AggressiveOpts

Use tiered compilation for improved startup

 -XX:+TieredCompilation

File Descriptors: Increase the maximum number of file descriptors if needed

 -XX:-MaxFDLimit

JMX (Java Management Extensions): Enable JMX for monitoring Kafka using tools like JConsole

-Dcom.sun.management.jmxremote-Dcom.sun.management.jmxremote.authenticate=false-Dcom.sun.management.jmxremote.ssl=false

Easier way to pass all these settings into the Kafka container would be

docker run \ -e KAFKA_JVM_PERFORMANCE_OPTS=”-Xms8G -Xmx8G -XX:+UseG1GC -Xloggc:/var/log/gc.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xss512k” \ other_kafka_options \ confluentinc/cp-kafka

Broker Settings

# Broker IDbroker.id=1# Listenerslisteners=PLAINTEXT://:9092,SSL://:9093# Log Directorylog.dirs=/path/to/kafka/data# Auto-Create Topicsauto.create.topics.enable=false# Num Partitionsnum.partitions=3# Default Replication Factordefault.replication.factor=2# Unclean Leader Electionunclean.leader.election.enable=false# Min In-Sync Replicasmin.insync.replicas=2# Default Log Segment Sizelog.segment.bytes=1073741824# Offsets Topic Replication Factoroffsets.topic.replication.factor=2# Socket Buffer Sizessocket.send.buffer.bytes=102400socket.receive.buffer.bytes=102400# Fetch Min Bytesfetch.min.bytes=1# Group Initial Rebalance Delaygroup.initial.rebalance.delay.ms=0# Advertised Listenersadvertised.listeners=PLAINTEXT://your.broker.address:9092

Producer Setting

# Bootstrap Serversbootstrap.servers=kafka-broker1:9092,kafka-broker2:9092# Acknowledge Modeacks=1# Retriesretries=3# Batch Sizebatch.size=16384# Linger Timelinger.ms=1# Compression Typecompression.type=snappy# Max Request Sizemax.request.size=1048576# Buffer Memorybuffer.memory=33554432# Timeoutsrequest.timeout.ms=30000delivery.timeout.ms=45000# Idempotenceenable.idempotence=true# Max In-Flight Requests Per Connectionmax.in.flight.requests.per.connection=5# Retry Backoffretry.backoff.ms=100# Delivery Timeoutdelivery.timeout.ms=30000# Transaction Settingstransactional.id=my-transactional-id# Security Settingssecurity.protocol=SSL

Performance tuning — Apache Kafka