Learn about the high-level architecture of Kafka.
- [Instructor] Let's take a look at the architecture of Kafka. And at a high level, we've seen this graphic before where we have producers. These are the data providers I've called them in the past, the things that make changes to our application and generate entries into our Kafka cluster. Now, these are then read by consumers and consumers are apps that are using the data. Now, again, they may do something with the data and then become producers writing back to the Kafka cluster. So depending on the operations, the reader and the writer can be the same thing, they can have multiple roles.
Also, we have connectors and these are the things that allow us to integrate with things like relational databases and monitor the changes in them automatically, and pull those changes into our Kafka cluster. So, again, we have a single source of the truth. We have an actual true log that has all the changes to a specific entity. We also have stream processors which allow us to do the streaming processing where we handle the data as it's coming in on the fly and responding to changes, if necessary. But let's take a look inside of the Kafka cluster here because there are some stuff going on that you're going to want to understand before you dive too deep into how you use Kafka.
First, again, we have our producers sending in the data and our consumers pulling it out. But between those two, inside of the cluster, we have something called brokers. And a broker is a logical separation of tasks, and you can think of it where a broker could be on a different machine perhaps, so it distributes the load and provides multiple backups of your data. That is, of course, if you configure it that way. Now each broker handles one or more Kafka topics. And a Kafka topic is basically that category of changes that I've been talking about.
So the customer information scenario, maybe you have a customer topic which has all the information about the customer, or you have one for sales orders, or likes on a post, or views of a video, whatever the case may be for your business. But those entities, those things that are changing, those things that you want to keep track of, those are what your topics are. These are really the main meat of what Kafka does and how it works. Now, all these different topics, each of them are set up inside of a broker.
Now, if you have multiple brokers, as you probably should when you're setting up Kafka, you can split those topics across those brokers into partitions. This is how you get the resiliency of the system. So if one broker goes down, or one hard disk fails, or something like that, you still have other copies of the data. So you're never really losing data. The thing about the partitions is that you have a lead partition, which is where all the writes are occurring. So this is where when new data comes in that writes the data to the actual topic.
And then that data is replicated out to the slave partitions or the replicas of that. So this way, one of the brokers is handling all the writes for a specific topic, while all the reads are coming from the other one. So we kind of balance that load out, that way we're not really overloading any one broker or any one topic, and we can have a really efficient, fast processing system. Now, not all topics are configured the same. They don't all have to have the same amount of partitions. So depending on how you configure it, and there are lots of different ways to approach this, you will have one to many versions of that topic, living across one to many brokers.
- Understanding the Kafka log
- Creating topics
- Partitioning topics across brokers
- Installing and testing Kafka locally
- Sending and receiving messages
- Setting up a multibroker cluster
- Testing fault tolerance