Learn how consumers interact from Kafka to read messages.
- [Instructor] When it comes to consumers, we're focusing on the apps that are reading data from our Kafka Cluster, and they're actually a little bit more complicated than the producers. First off, consumers are organized into consumer groups, and these groups, they have all the partitions of a topic that they're reading from, divided among all the consumers. So, if there are 30 partitions and 30 consumers in the group, each one will have their own partition assigned to them. Now, as consumers come and go from the consumer group, the consumer group needs to be rebalanced.
Now, this is handled by Zookeeper and there is a newer version of Kafka which has a Broker coordinator, but just know that there is essentially another part of the system that is monitoring these consumer groups, making sure that they don't become overly bloated or that partitions are not being assigned appropriately. So, for the configuration, when you set up the consumer group, you need to add the consumer group ID, which is essentially the name or the identifier for the consumer group. There's also the session timeout, with a default value of 30 seconds, but you can increase this to avoid rebalancing too often, and creating too much overhead.
The other thing is the heartbeat, and the heartbeat is set up to let zookeeper or the broker coordinator know that the consumer is still there. If it stops, then it will rebalance because it knows that the consumer is no longer there, and it needs to reassign the partitions so that everything is in harmony on your cluster. Now, this heartbeat can be adjusted as well, using the interval configuration, and it just helps with throughput again, and it tries to reduce any overhead that may cause your cluster a little bit slowness, and we don't want that.
Remember, we want super low latency here when maximize the throughput flow of our system. And one of the key things about throughput is the autocommit. And the autocommit speaks to the offset of the consumer, and remember, each consumer will be reading from a different partition, so they'll have different offsets, and that is the number of changes that they've read, so if a topic has five thousand changes, they may not have all read all five thousand at all the times, so what ends up happening is, you'll have different levels of this, and the autocommit is every five seconds, where it logs essentially where it's at, and that is because if the consumer were to go down and then come back online, there may be a number of messages that have occurred in between that time, and it knows exactly where it needs to pick up from, so that way it doesn't lose any changes, and nothing gets mixed up.
So, if we take a look at a diagram here, trying to explain this, we have a couple of different servers, and we have different partitions of a topic, and the topic is then spread across multiple servers. So you have these different topics on these different servers and they are serving these different consumer groups. You have Consumer Group A and Consumer Group B. Now, you can see that the partitions are evenly distributed so that in Consumer Group A, it only has two consumers, so all four of the partitions on the multiple brokers are being assigned to each consumer.
That way, it can read from either one of those. It knows that it can read from any of these different partitions. Now, in Consumer Group B, we have four consumers and four partitions, so each one only gets assigned one, so each consumer is only reading from one of the partitions. This way, if there is some sort of issues with the offset and it needs to switch, it knows where it left off again. So, the rebalancing when a consumer goes offline or leaves the group or comes back in the group, it doesn't matter so much because it has that offset.
It knows, it remembers exactly where it left off from. If we take a look at this example and again, we'll run this in our terminal window here in a minute. We can run the console consumer, which comes with Kafka. We can set up the local host of where it's running. We can tell it the topic, where we want to read from, and then it will actually print those messages out. So, if you watched the previous clip, where we set up the producer example, we entered three different messages, and here's the example where we're reading those and we're telling it where to read them from, what is the offset.
In this case, we're passing in from beginning. And again, we'll run this here in a minute so you can see it in action.
- Understanding the Kafka log
- Creating topics
- Partitioning topics across brokers
- Installing and testing Kafka locally
- Sending and receiving messages
- Setting up a multibroker cluster
- Testing fault tolerance