Learn how topics and brokers work inside of Kafka.
- [Narrator] So what is a topic anyways? I like to think of a topic as a category of something or a feed of changes. We talked about customer information being a category, which would be a great topic. Another could be sales orders or customer visits to the website. All of these are great use cases for topics. Essentially each one is a separate log in Kafka and these topics as I mentioned in the architecture are partitioned across multiple brokers, which allows them to be resilient.
If one broker goes down or one hard disk fails, the topic can still be read and in fact when that happens, which we'll test coming up here in a minute, the previous broker which was handling only the Reads, now will start handling the Writes. So this way operations can continue without any gap in functionality or capacity. If we take a look at the Anatomy of a Topic, there are three partitions here. There is partition zero, one, and two and they start with the index of zero.
So that's important to know as we start to set things up and actually work with the data. And the way they work is that when a Write happens it does so first on the lead partition, this is partition zero. Now with partition zero what's going to happen is after the data gets written there, it's going to also be written to the other partitions so we have replicated copies. Now a consumer could look at any of these partitions and depending on their offset, which is the number of changes that they've processed or handled since their last Read from the topic, they may be redirected to a different partition.
So there are different ways of strategizing about this so that way you're balancing the load across your cluster effectively. We'll dive into that a little bit more coming up but the anatomy is basically that you have a topic, which is a category of the thing that you want to keep track of separated by partitions across brokers to give you that resilient, sustainable system. So what is a broker? Well a broker is a logical separation of partitions of topics, it's just containers or buckets.
They handle many topics and they give you that resiliency. Let's take a look at an example. Here we have three brokers and we have lead topics, and these topics here are of partition zero. And you can see that those topics also exist on the other brokers, meaning they're evenly distributed across our cluster. And those are in green because they're a replicated copy, they're not they lead copies. Now if this is our cluster and let's say broker three goes down.
On broker three we have topics one, two, and three but only one of them is really the lead. So, if I fast forwarded and see what would happen the lead for topic three now has been moved over to broker one. So that way broker one is little bit overloaded because it's handling writes for both topic one and topic three. And now broker two has adjusted its partition as well. So, when one of these goes down, the system continues to operate and it just automatically shuffles where the operations are occurring.
- Understanding the Kafka log
- Creating topics
- Partitioning topics across brokers
- Installing and testing Kafka locally
- Sending and receiving messages
- Setting up a multibroker cluster
- Testing fault tolerance