Join Joseph Lowery for an in-depth discussion in this video Loading data, part of Google Cloud Storage and Data Essential Training.
- In this lesson, we'll create a data set to be used by BigQuery. I've compiled a list of the most popular baby names as recorded by the US government from two thousand three to twenty thirteen for our demo. Let's start by looking at this CSV file in Excel. You can find this file in the Chapter 5 05_02 Start folder, it's called baby_names.csv. As you can see it's a pretty simple file. There are four columns: Name, Gender, Frequency, and Year.
And if I hit cmd + end to go to the end of the file you can see that there are 368,840 rows. Not exactly massive but pretty darn big. Okay, let's switch to Google Cloud Console and I'm here in my project page. Now, go down to Big Data, expand that, and then choose BigQuery. As of this recording, the BigQuery interface is separate from the standard Google Cloud UI so a new browser tab or window will open and here you can take BigQuery for a spin using some sample data that you see over here on the left and also try out the BigQuery browser tool.
We're going to explore that in a later lesson. For now, I just wanna start by creating a new data set and importing our own data. So, on the BigQuery page click the down arrow next to the project name and choose Create New Data Set. First we'll give the data set and ID. Use only numbers, letters, and underscores for this. So, I'm gonna enter baby_names_lynda and then click OK.
And you can now see the data set appear under the project name. If you hover over that you'll see a + symbol to the right as well as a down triangle. Click + to bring up the Create and Import dialogue box. Okay, now we'll enter in a table ID. Again, only numbers, letters, and underscores. So, let me put in, bn, baby names, _2003_2013 and then click Next.
Next up, we choose the source of the data. Ours is a CSV file which happens to be the default. Now, we also have the option of importing a JSON file or using an app engine data store backup. So, let me click Chose File and then I'll navigate to my file, which is found in the Exercise Files, Chapter 5, 05_02 in the Start folder, and there it is, baby_names.csv. All right, click Next. Now it's time to define the schema.
Now, BigQuery needs to know the name for each column and its data type. The format is column name; data type, and then the next column name and data type. So, my first column name is Name then I put in a ; and the data type, which is String and I'll separate the next pair with a comma. And next up is Gender, that's also a string, and let me just verify this by looking at the Excel sheet.
I need to do Frequency and Year. So, comma to separate the pair, Frequency, and this is an integer, and then finally, Year which is also an integer. Okay, once that's done click Next and finally we have some advanced options including defining the field delimiter which in our case is a comma so we can accept the default option there. Now, I do have a header row so I'm gonna go ahead and change the 0 to 1 in that field.
Number of errors to allow. We don't want any errors so I'm gonna leave that at 0. All right, I am ready to hit Submit. Now, you'll see the table name appear under the data set and if I hover over it I see that it is status is loading and just a moment or two that should change. The loading disappears and should be successful. To verify that it is, just go ahead and click on it and it looks like it good. It picked up our schema.
Let's check on details here and yes, there's the rest of the information. It's just under, well, it's nine and a half megabytes so pretty hefty file. There's a preview of the various data. Everything looks in order, cool. Now, we're ready to analyze our data which we'll tackle in the next lesson.
- Installing the Google Cloud SDK
- Working with buckets and objects
- Building a website with Cloud Storage
- Using Cloud SQL to manage data
- Setting up Cloud Datastore
- Exploring data with BigQuery
- Managing storage and data with Python