Anatomy of an Incident: Events on Kafka topic out of order

Recently at work a team rewrote their Kafka producer and then noticed unordered events appearing in their topic. I didn’t know how this was possible - given my limited knowledge of Kafka - so drew this sketch based on their explanation, which helped me understand what was happening…

Here’s a written breakdown:

Overview of how Kafka partitions data

The Producer’s Partitioner decides on which partition Kafka should store the message in on the topic. It can decide this based on the message’s key, from the producer or simply round-robin.

Diagram of Kafka Producer's messages being partitioned by Account ID

Kafka guarantees the order of messages within these partitions, but not across partitions.

When the service was rewritten a Partitioner from another language was used. Although both decided on partitions based on the message’s key, the new one did it differently.

Diagram of new Kafka Producer partitioning messages with same Account ID differently

This means that if a consumer re-consumed the topic then the events for specific accounts would be out of order.