Apache Kafka® is a distributed streaming platform. What exactly does that mean?
We think of a streaming platform as having three key capabilities:
- It lets you publish and subscribe to streams of records. In this respect, it is similar to a message queue or enterprise messaging system.
- It lets you store streams of records in a fault-tolerant way.
- It lets you process streams of records as they occur.
What is Kafka good for?
It gets used for two broad classes of application:
- Building real-time streaming data pipelines that reliably get data between systems or applications
- Building real-time streaming applications that transform or react to the streams of data
To understand how Kafka does these things, let’s dive in and explore Kafka’s capabilities from the bottom up.
First a few concepts:
- Kafka is run as a cluster on one or more servers.
- The Kafka cluster stores streams of records in categories called topics.
- Each record consists of a key, a value, and a timestamp.
Kafka has four core APIs:
- The Producer API allows an application to publish a stream of records to one or more Kafka topics.
- The Consumer API allows an application to subscribe to one or more topics and process the stream of records produced to them.
- The Streams API allows an application to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams.
- The Connector API allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems. For example, a connector to a relational database might capture every change to a table.
In Kafka, the communication between the clients and the servers is done with a simple, high-performance, language agnostic TCP protocol. This protocol is versioned and maintains backward compatibility with the older version. We provide a Java client for Kafka, but clients are available in many languages.
- [KAFKA-5140] – Flaky ResetIntegrationTest
- [KAFKA-5967] – Ineffective check of negative value in CompositeReadOnlyKeyValueStore#approximateNumEntries()
- [KAFKA-5970] – Deadlock due to locking of DelayedProduce and group
- [KAFKA-5986] – Streams State Restoration never completes when logging is disabled
- [KAFKA-6003] – Replication Fetcher thread for a partition with no data fails to start
- [KAFKA-6026] – KafkaFuture timeout fails to fire if a narrow race condition is hit
- [KAFKA-6030] – Integer overflow in log cleaner cleanable ratio computation
- [KAFKA-6042] – Kafka Request Handler deadlocks and brings down the cluster.
- [KAFKA-6087] – Scanning plugin.path needs to support relative symlinks
- [KAFKA-6116] – Major performance issue due to excessive logging during leader election
- [KAFKA-6119] – Silent Data Loss in Kafka011 Transactional Producer
- [KAFKA-6131] – Transaction markers are sometimes discarded if txns complete concurrently
- [KAFKA-6134] – High memory usage on controller during partition reassignment
- [KAFKA-6179] – RecordQueue.clear() does not clear MinTimestampTracker’s maintained list
- [KAFKA-6190] – GlobalKTable never finishes restoring when consuming transactional messages
- [KAFKA-5725] – Additional failure testing for streams with bouncing brokers