What is Apache Kafka?
Apache Kafka is a messaging platform used by more than a third of Fortune 500 companies. Uber uses it for calculating surge pricing; Netflix uses it for recommendations; Linked-In uses it for keeping track of its millions of subscribers posting content on various topics and many more.
Earlier we used to have databases for storing everything. That was what it was about – things and their state. As the number of servers increased, we had many interconnections between these servers. For example, if we had 8 servers communicating with 4 other servers, the math is straightforward – we had 8 * 4 = 32 integrations. As servers kept getting added, these become a cumbersome task to be managed. Hence, we move from the database to logs. Or simply events. These events are stored in what is called messaging queues. Kafka is a messaging mediator that sits between a producer and a consumer. Producers can write messages to the queue with a guarantee that the message won’t get lost. On the other hand, a consumer can read the message in many ways.
Kafka decouples data pipelines. It is a distributed publish/subscribe messaging broker. When you talk of Kafka, these are the most used terms:
Producer: This is the place from which a new message originates.
Consumer: This is the server that receives these messages.
Partition: Is a part of a Server.
Broker: Is a machine typically acting as a Server.
Topic: This is the name under which the message is published.
ZooKeeper: This is software used to coordinate the Kafka ecosystem.
ZooKeeper is a part of a parallel processing architecture called Hadoop and helps in controlling, cluster membership and topic configurations in a Kafka environment.
There are many architectures of the Kafka platform, but the most widely used is what is called a multi-node-multi-broker architecture.
The following are the features of Kafka:
- High Throughput
- Minimal Data Loss
- Stream Processing
Kafka is an open-source Apache foundation software. It’s maintained by a company called Confluent. Most of the features of Kafka are free, except if you want to try out the enterprise system or the Cloud version of it.
Some interesting products that Confluent provides are:
- Kafka Connectors (AWS Lambda, HDFS-3 Source, SalesForce, etc.)
- MQTT and REST Proxy (For IoT and Web Development)
- KSQL (Write SQL to better connect with Kafka)
From its inception in 2011 at Linked-In, Kafka has come a long way in being the log platform of choice for many.