Learn what producers are and how they interact with Kafka.
- [Narrator] When it comes to producers, there are the things again that generate the data and send it to your cluster. Now when they publish this data, they use something called a partitioner. And the partitioner's job is to figure out which partition for the topic that the producer is writing to is the current lead, and, where that data is then going to be written to, which broker that is. Now after the data gets written, and the partitioner has sent that back, it will need to know whether or not the message was received, and whether or not the data has been written successfully.
You see, what happens typically is, the response back is only done after the data has been replicated. That way, if any issues arise, during the process of writing the new data, it will already have been replicated, and you won't lose anything. Now, the real challenge here, is to focus on throughput, because the bigger throughput you can have, and there's lots of way to tune that we'll look at, the bigger amounts of data that you can handle. So the larger the system can scale.
Otherwise you need to consider adding more nodes, and other things to scale your cluster beyond just the initial setup that we do. So for the configuration, the durability is a key thing to think about. And you can set that at the lead partition. You can tell it to either respond immediately, that the message was received, or you can tell it to wait until the message has been completely replicated. So depending on what you want to do, and how important throughput is, you may want to adjust this setting. Another setting is the ordering and retries.
And these are here to help you with matching, where, the data comes in, if there's an issue with that one write, other data may be right on its heels and may want to be written before that write occurred. So you need to set up the amount of retries that you're going to have, which will essentially pause or prevent the other writes from happening before this one does. So that when everything does get written, it's all in chronological order. Now, as you work with Kafka more, some of the tuning parameters that you're going to want to look at are the batching and compression.
And, these have to do with throughput, because remember, throughput is the name of the game here, and what we want to do is maximize that so that our system can handle, you know, potentially gigabytes or even terabytes of data in real time. So the smaller you can make the size of the messages, in the batches as appropriate, depending on the speed at which the messages are coming in, you can really optimize your throughput. It's also important for the system's ability to make sure that you are setting up queuing limits.
And, there is a buffer.memory config item which helps you set up the total memory that is available to the Java client. And this is the client that is collecting unsent messages. This is important, again, because if you were to overrun this buffer, and run out of memory, you could have some serious issues in your cluster, and data loss, and all kinds of bad things could happen. So, if we take a look at the Producer API, it's pretty simple, and we'll walk through this example in detail in just a little bit. But, the basic idea is to call the Kafka console producer, and you can do this from Terminal on a Mac or Lenox, or PowerShell in Windows.
And for this, what you need to do is you need to set up the broker, you find the broker list, and then you specify the topic. In the example we'll run through, what we'll do is use the test topic that we create. And, when you run this, you get the ability to type messages directly into your terminal window, and in doing so, you're sending them to that Kafka topic. This is a really simple way of testing, and we'll take a look at that just in a minute here.
- Understanding the Kafka log
- Creating topics
- Partitioning topics across brokers
- Installing and testing Kafka locally
- Sending and receiving messages
- Setting up a multibroker cluster
- Testing fault tolerance