From the course: Data Science Foundations: Data Engineering

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Verifying addresses

Verifying addresses

From the course: Data Science Foundations: Data Engineering

Start my 1-month free trial

Verifying addresses

- [Instructor] All right, now let's take a look at verifying addresses. First, I want to create a zip code lookup table that we'll use as a reference point. Then we'll find invalid city and state combinations. And lastly, we'll replace any city and state combinations in our table based on whatever our zip code look up table has. Here in my virtual environment, I have 3_3.sql loaded from my exercise files. And it calls for us to create some stuff on the command line so I'm going to click on my terminal window here, and I'm going to type these commands in, so, hadoop fs -mkdir /data/zipcode. This will be where we load our zip code database. Then we need to browse over to where we have that .csv file, so that's going to be under media/sf_Exercise_Files and data. You can see we have zip_code_database.csv there, and we're going to put that into hadoop using hadoop fs -put zip_code_database.csv /data/zipcode. All right, with our data in Hadoop now, we can go and create our table. Need to…

Contents