Learn how to set up Eclipse, MySQL, and Kafka for the course exercises.
- [Instructor] In the last video we showed you how to download your exercise files and extract them to the desktop. In this video we are going to show how to set up your environment. We're going to first start with setting up your Eclipse with the Java project. To do that you first go to Files, Import, and in the list of project types you choose Maven, Existing Maven Project. Hit the next, go to the root directory, and pick up the place where you have extracted all the files which is Desktop, Exercise Files, and spark-big-data-engineering.
Say OK, and hit Finish. Now Maven might spend some time here trying to extract all the files if it is going to download all the dependencies if it doesn't have it already in the repository. In this case it already has, so it kind of quickly compiled and gave you some errors. So, you see a bunch of errors here. A lot of times this happens when your set up is not correct. So, let us go and fix the set up. So first thing I'm going to go here is in Java Build Path it is pointing to J2SE-1.5.
I'm going to remove this. I'm going to add a library, which is a JRE System Library. Do a next, and use the workspace default, which is 1.8 and hit finish. Also, I want to make sure that the compiler. So, once I hit the Apply it is going to go and compile once again. But you also see that the Compiler compliance level is 1.5 for Java compiler. I need to fix that guy also and hit an Apply, and it is going to rebuild the project.
And now all the errors that you had has gone away. So this project, as you can see, it's a straightforward Maven project. Under src/main/java you have a bunch of applications here. These are all your example files which I will walk you through when we get to those specific videos. So, now we're going to go and set up my SQL for our exercises. I have actually a cloudera CDH VM here in which I'm going to be running all the exercises.
It already has MySQL installed, so I'm going to simply go and log in and create the user's databases and tables that I need. So first I'm going to log in to MySQL as Root. And then I'm going to create a user called cloudera identified by cloudera and I'm going to give privileges to access it from all machines and then I'm going to do a flush privileges and an exit. So, the user is now created, so that is cool.
Now I'm going to go log in into that user now, mysql minus u cloudera minus pcloudera and I've logged in now. Now what I have now is following this is a bunch of commands for creating tables so there is one called jdbctest which is going to be used for us to demonstrate Kafka Connect for JDBC, so I'm going to create that database first, and then I'm going to go and create a table called JDBC test here.
That is created and I'm going to insert some values into that table. So, if I do a select star from jdbc_source all it does is a test ID and a test timestamp, nothing more than that. Then I'm going to go and proceed to create the exec_reports database which we're going to be using for our use case. Very similar I'm going to create the database, I'm going to create a table and also insert some data into that.
And that is done. Then there is a database called us_sales. Again used for our use case. There is going to be a table called garment, I'm going to create there and then I'm going to insert some records into that table. And finally, there is eu_sales or europe_sales. In there I'm going to create a table called book_sales and insert some records in there. So, that's all then so if I do show databases, you should see all of them created along with the normal regular databases you find with CHD5, so we have all our us_sales, eu_sales, exec_reports and jdbctest.
Now after setting up MySQL, we going to go and set up Kafka. Now when you download the CDH5 you do not typically have Kafka installed on it. If you already have Kafka, well and good, otherwise I'm going to just show you how to set up Kafka in CDH5. So you start off here by doing a sudo yum clean all and then you going to first install Kafka, so it's sudo yum install Kafka and that should go and set up Kafka for you.
When you install Kafka you also get Kafka Connect install along with it so you don't need a separate insulation for Kafka Connect. Then you're going to install Kafka Server. Again, the same command, sudo yum install Kafka server and then finally once everything is set up, you going to go and start Kafka Server, sudo service Kafka server start. So, it's already running so these are commands you need to set up and run Kafka and Kafka Server.
- What is data engineering?
- Spark and Kafka for data engineering
- Moving data with Kafka and Kafka Connect
- Kafka integration with Apache Spark
- How Spark works
- Optimizing for lazy evaluation
- Complex accumulators