Learn why Kafka is an efficient platform.
- [Instructor] Let's look here at the efficiency of Kafka and what it means for your organization. The implication if you actually wanted to use Kafka. First, let's take a look, if you wanted to sync all the different aspects of a high tech or modern organization that has things like search monitoring and data warehouse, you'll have different platforms like Espresso, Voldemort, and Oracle or many others, and each one of these will handle a different part of your business. You'll want to track operations with a monitoring dashboard, and do things like security auditing and providing recommendations to your users.
All of these things require data to be synced. And in a typical way, it's basically a bunch of scattered lines going between each system. Different systems have different APIs, and you might have different messaging protocols. All the different systems that are required to support these operations really are necessary for most modern companies. With Kafka, the idea is that you simplify all that. No longer do you have to decide where the source of truth is between certain apps. The source of truth lives within Kafka in the log.
A simpler way to look at that would be as if you were to essentially erase all of these lines between the different systems, and point them at a single big box which represents your Kafka cluster. So all of the data that is being written to Kafka, then anyone else that is reading it all comes and goes from this same central point. Anytime the data is read and then processed again, those changes will also need to be written back so you have a single source of truth for that entity specifically inside of Kafka.
Now this creates an incredibly efficient environment where you have a central repository for all the different entities and streams of data in your enterprise. So no longer do you have to make API calls to third party systems and wait for that return in order to do your job. You can just write directly to Kafka, and Kafka's built in nature is that it does this write operation very quickly so you can continue on with whatever you next task may be. This also provides great historical tracking, so if you want to know who made what change and when that change was made, all of those things will be logged inside of your Kafka cluster instead of in varying formats between the different apps and systems.
The log is also a great way to bootstrap new applications. If you have a new app coming up, and it needs to get all the customer information, let say, and their sales history and maybe their demographic information. Those elements, or those pieces, of the customer profile can live in all different systems. In a typical company, they would be managed in different systems and different databases with different APIs and different interfaces. In the Kafka world, if you wanted to set up that new app, you can basically get all that data directly from Kafka.
Now, if you have multiple topics, or entities, as they're called, you can pull all the latest copies from those topics directly, or you can take the latest snapshot and then apply whatever changes were necessary to get you up to speed. Now, this is really beautiful because you no longer have to know where those different elements live. You don't have to know that Salesforce manages one part of your customer profile and the website manages another. And not having to scour the many systems you use to find that information can really save you a lot of time and help you speed things up.
In organizations that haven't adopted something like Kafka, this can be a real problem that slows down your product development cycle. But what are the impacts? Is this a magic bullet? Can you just flip on the Kafka switch and everything starts working? Of course not. You'll need to do some re-tooling. And you're going to have to change how your applications work. Some of them you won't be able to change very easily, such as Salesforce. Some of these third party systems don't have an easy way for them to work with Kafka, so you may have to build a middle layer here, a service that interacts between your Kafka cluster and whatever that API is for that external system.
Now anytime you introduce a new platform, there's going to need to be a real big investment in training for everybody, and actually setting aside for a time so people can learn it and implement it. Otherwise, it'll be a new toy that only helps out a few teams and no one else really can or does adopt it. So when you look at using Kafka, think of it as an enterprise class solution. It'll solve a lot of problems. However, it's not going to be the easiest thing to set up, since you have to change how all of your architecture works today.
But in the end, what you're going to have is a lot more capacity to move quickly. In the short term, it may feel like you're slowing down a little bit, but as one of my old coaches would tell me, you have to slow down to speed up sometimes. And lastly, any time you transition your platform, there's going to be maintenance, and new people you're going to have to hire, things you may have to buy. So don't think just because Kafka is open source, and the licensing is free, there's no cost to use Kafka, it doesn't mean that it's entirely going to be free to implement it.
You're going to have to buy new hardware, probably get a new maintenance contract and a support contract if you're a larger enterprise.
- Understanding the Kafka log
- Creating topics
- Partitioning topics across brokers
- Installing and testing Kafka locally
- Sending and receiving messages
- Setting up a multibroker cluster
- Testing fault tolerance