JSON has become the standard in web data transfer. If you're doing data science with Python, you'll definitely need to know how to work with it. Learn how to ingest and explore JSON data using Python in this video.
- [Instructor] We're going to start here by looking at how to work with JSON data. Now, JSON data is extremely common in web format. It is sort of the default of how data gets exchanged in web applications, so in your data science workflow, you're going to run into it at some point or another. Now what I've done here, I've loaded jupyter notebooks, and I have the exercise files loaded from this course, so you can download those and then go and open it and you'll see exactly what I'm seeing here. As well, we have a data folder which I showed you just a second ago that has all the files that we're going to be using here.
The first thing I want to do is just open up "01_01 IPython Notebook". I'm running this in Python two, so if you only have Python three, you're going to want to go back and install this version as well, otherwise you'll hit some errors, specifically around the print command. So the first thing to do in working with JSON and Python here is to load the JSON library or module. So I'm just going to run this command here, and you can see it's successfully executed by giving me the one next to it. Now in cell number two, what I want to do is open my file, so I'm going to use a with command, type "open", and then give it the path.
And you can see here what we're looking at are monthly sales data by category. We're going to load this as JSON data and then create a new dictionary out of it using the JSON dot load function. Now that we have a dictionary d that has all of our data loaded, let's see what it looks like. Let's just do a pretty print here. You can see that at the top level we have contents, and then down below we have category, monthly sales, and region, and there's three levels here, so we're going to dig in and I'm going to show you how to explore all three levels of this file.
First, if we take a look just at the top level, you can see we only have one key. There's only one value there, and this is something I like to do once I get a dictionary loaded, especially if it's a big file or it's not totally obvious what kind of data I'm going to be working with. From there, I want to print the keys at the second level, so I need a for loop here, so I'm going to say for a in d, and then I'm giving it the key from that top level, contents, and inside of there, just a simple print command.
This is where, if you have Python three, you're going to hit your first error. Once I run that, you can see that at the second level what we have is category, monthly sales, and region, so some pretty useful data points for us to then do our data science work. If we want to dig in a little bit deeper, we can print the keys at the third level. In order to print the keys at the third level, we need to have a nested for loop. So on top, we have the for a in d contents, which gives us a new object to parse, so here we're going to say for b in a, and give it monthly sales.
So we're looking at that second level for the monthly sales portion of our dictionary. And again, just a print of the keys. And inside of that, you can see we have sales and months. So these are the two attributes inside of our monthly sales. Now if we want to see the key and the value, which would be helpful if we want to see the month and then the sales amount for that month, we need to do it a little bit differently. Here we have for, but instead of having just one variable we have two, key and value, and we're going to say d dot items, so these are all the items in our dictionary.
Then, our print command becomes a little bit more complicated. We have a key, and then we add in a colon and a space, and we pass in the value using the pretty print function. I run that, and you can see that it becomes a bit clearer. It becomes a bit easier to understand. In fact, the formatting gets a lot nicer. The contents aren't surrounded by all these extra quotations and other things, and it just becomes easier to read. Now, if we want to do this at the second level, we can run basically the same thing, but we have again a nested for loop.
So on the outset you have for a in d contents, just like we did before. Then you have this print key in value, just like we did a second ago. Here it becomes a bit messy, but you can see that the data is becoming easier to read. We're stripping out all of the JSON formatting, and we're getting towards something a user could actually understand. Lastly, let's do this at the very base level. Here we have our nested for loop just like before. We start out with contents, then we parse monthly sales, and then we have our key values at the very end.
When I run this, you can see that we have just the two values, sales and month, something that we could use to graph, or to do forecasting, or even just print this out in a table format for users to consume.
- Working with flat files, including Parquet
- Reading data using APIs or libraries
- Inspecting and aggregating data with Pandas
- Exporting data with Pandas
- Creating charts using ggplot
- Styling plots using ggplot
- Finishing data visualizations