Learn what auditing options are available in Kafka.
- [Narrator] Auditing is another thing that you're going to want to pay close attention to when running Kafka in a production environment. And, when we're talking about auditing, what we're talking about is verifying that all the messages are being handled properly, and making sure that they're being delivered and they're being processed and everything is in sync, and you're not losing messages or losing data. There are three types of auditing that I'll talk about here, but there are lots more options that you can explore. The first one is just the, what is called Kafka Audit.
And this is essentially a methodology of creating an auditing system that you roll yourself. You roll your own. Now, the way that works is you have multiple data centers and you use mirror maker to mirror essentially the different messages going back and forth, and then you use a third data center to compare the results and see if what's happening across those two data centers is in fact correct. If it in fact does match up. The second option I'll talk about here, because that one can be really detailed and you can go really down the rabbit hole with it.
So there's other options out there, and one of them is from Uber. And this is because Uber uses Kafka extensively for a lot of their applications and their data. They have an interesting approach to how to make this stuff work in the enterprise, where you have one data center and you have applications that use a proxy client, so they use a proxy Kafka interface which then sends those messages across, then from there it goes into a regional Kafka cluster.
This is where the chaperone service comes in and it starts to audit everything and find out when messages were timestamped and the order in which they actually occurred. Then from there they go off to a different data center where they're aggregated up with other regional Kafka message streams. Again, the chaperone service is monitoring what's happening there, really looking at the timestamps of when things occur. Because remember, one of the toughest parts about the streaming business is keeping all the messages in order so that way the operations that then happen based on the messages don't fall out of sync.
Then from there, from the aggregate Kafka cluster, it goes to the chaperone collector which is logged to a database on top of which is a web service that can do things like have websites showing you the health of the cluster and how the auditing and monitoring is going. So, Uber open-sourced this and you can look it up, it's called Uber Chaperone. It's a really interesting architecture and design, as well as some tools that may help you audit your Kafka cluster. The third one to mention is from a company called Confluent, and Confluent is sort of like a version of Kafka that has been made open-source as well, but with some commercial components, some added services that you can buy.
And they have something called a control center, and this is essentially a series of charts and graphs and dashboards that allow you to monitor the health and the sustainability and durability and everything of your Kafka cluster. So, this is at confluent.io and you can check that out there. Now this company was founded by Jay Kreps, who was the guy at LinkedIn that originally created Kafka, so they definitely know what they're talking about, and they definitely are advancing Kafka beyond the Apache project itself.
So, definitely something to consider and look at, especially if you're an enterprise that needs something like training and support when you want to consider going down this route.
- Understanding the Kafka log
- Creating topics
- Partitioning topics across brokers
- Installing and testing Kafka locally
- Sending and receiving messages
- Setting up a multibroker cluster
- Testing fault tolerance