Apache Kafka

Data & Storage intermediate

Apache Kafka is a distributed event streaming platform used for high-throughput, fault-tolerant publish-subscribe messaging and real-time data pipelines.

Summary

Apache Kafka is an open-source distributed event streaming platform originally developed at LinkedIn. It enables applications to publish, subscribe to, store, and process streams of records in real time.

What is Apache Kafka?

Kafka organizes data into topics, which are partitioned and replicated across a cluster of brokers for durability and scalability. Producers write records to topics while consumers read from them independently, allowing multiple applications to process the same stream at their own pace.

Because Kafka persists records on disk with configurable retention, it acts both as a message queue and as a long-term event log. This dual role makes it suitable for event sourcing, audit trails, and replaying historical data.

Kafka is commonly used in microservices architectures to decouple services, buffer spikes in traffic, and ensure reliable data delivery between components.

Why is Apache Kafka relevant?

High throughput: Handles millions of events per second with low latency
Durability: Replicated storage prevents data loss during broker failures
Decoupling: Producers and consumers are independent, simplifying system evolution
Ecosystem: Kafka Streams and Kafka Connect extend it for processing and integration

Related Terms

MongoDB

MongoDB is a leading open-source document database that stores data as flexible JSON-like documents, enabling rapid development of modern applications.

Discover more

We are here for you

You are interested in our courses or you simply have a question that needs answering? You can contact us at anytime! We will do our best to answer all your questions.