Learn how to setup the VM and load the exercise files.
- [Instructor] For this course, we're going to need to start by setting up our local environment, and we're going to do so by going to cloudera.com, looking for the Downloads link, and clicking on QuickStart VMs. Now these VMs, these virtual machines, have everything we need to get going. So I'll choose the version, and right now, I'm looking at CDH 5.8, and the platform I want is for VirtualBox. VirtualBox is free software from Oracle that we'll use to run the VM. I click Get It Now, enter my info, or sign in if you have a Cloudera account, click Continue, agree to the license, and then your download will begin.
Now that I have my download, I have it sitting here in my Downloads folder, and I'm going to unzip this, I'm just going to double click it, and if you're on a different operating system than me, it'll look and feel a little differently, but you just need to extract that archive. Once I have extracted that, I have my OFV file. This is the file that I'll use for VirtualBox. And I have VirtualBox already installed, I'm going to run over to that and say File, Import Appliance, and point it at the file that I just downloaded.
I'll click Continue, and I'll leave the settings as is for now, and Import. And after the import is completed, what I need to do is set up my networking settings. And I like to do this, where I go to Preferences on the VirtualBox, and for the Network, what I want to have is a host-only network. I'll go over to Host-only Networks, and I want to add a network here.
By default, it creates one called vboxnet0, and I'll click on the little edit icon so I can set this up. Now, what I want to do is create a DHCP server, which means that it'll automatically assign a new IP when I fire it up. I'll click on Enable Server, and I'll enter the information here so I can get started. I like to use 192.168.56.100, that's for the server itself. For the server mask, I'll do 255.255.255.0.
Now the lower bound, the lower end of the spectrum for IPs we'll assign, are 192.168.56.101. So that would be the first machine that gets assigned. Then for the upper address, I'll copy and paste that, and just change the very end to 254. Now we've got a host-only network that we could use if we wanted to connect into our virtual machine from our local machine here. I'll click on Settings for the local machine, go over to Network, and for Adapter One, I'm going to make this the host-only adapter, so that way, this is the network that's being used.
I'm also going to add a second adapter here, which is the NAT, which allows the machine itself to reach out to the Internet, in case you want to download any files or access anything external. I'll hit OK, and it's time to start my VM. And once the VM comes online, I'll go dismiss these couple errors here, which are just saying that it's going to automatically capture the keyboard and mouse when I click inside of it. I'm going to make this full screen so we could see it a little bit better here, and this is where I get that message again about capturing the keyboard and mouse.
Now in order to access the exercise files, which we'll use throughout this course, we need to set up a shared drive, a connection between our local machine, where we've downloaded those exercise files, to this machine here. And so what I need to do is go up to the terminal, and on the terminal, I need to edit a file which will give me permissions. So I'm going to type "sudo," "gedit," space, slash etc, slash group. And at the very bottom of this, there will be "vboxsf," and at the end of this, I need to type in my username for the local machine, so that's "cloudera." I'll click Save, and close.
I'll close the terminal window. Now I need to go up to my Settings, under Devices, for Shared Folders on the VM, and I need to add a folder here. So I'm going to click plus. The Folder Path is going to be at my exercise files, which I've downloaded to the Desktop. Click Open, and the Folder Name here is fine, Exercise Files. Now I'm going to make this Auto-mount and Make Permanent, and hit OK.
Now I need to actually restart the VM, and then I can show you exactly where those files are going to live. So I'll click on System and Shut Down, and Restart. And with the VM back online, I'll go ahead and close the little warning here. I'm going to look at the file browser, so I'll go over to Applications, System Tools, and File Browser, and here you can see that I have everything downloaded from my Desktop that's now accessible inside of this VM, so we can get going.
- Working with files
- Organizing files in HDFS
- Connecting to Hadoop
- Exploring Hive through Beeline
- Accessing Hive from Python
- Creating aggregates in Hive
- Selecting partitions in Hive
- Complex data structures in Hive
- Mapping data in Hive
- Creating flat tables for Impala
- Deconstructing Impala queries