Learn how to perform a basic exploratory analysis in Python.
- [Instructor] In the last video, we created a heat map with R. Let's do something similar with Python. I have my Jupyter environment open and ready to go. If you need help on how to open up the application, please refer to the video from earlier on in this course. Now, keep in mind that these hashtags denote a comment, so what we have here in our exercise file are cells with comments that speak to exactly what we're going to be creating and what we're going to be doing in this overall process. The process of bringing in additional packages is something to mention, too.
With both R and Python, there are times where you'll need additional functionality in addition to what's already out of the box. And a lot of times what you'll do is use packages. So we're going to go ahead and bring the Pandas package, which is a popular package for Python, in by typing out import pandas as pd. And I'm going to go ahead and click Shift and Return. Now, let me point that out. Shift and Return.
If I just selected Return, that would open up the ability to create more code. But what I've done here is I've said okay, we want to bring in the Pandas package, I'm ready to execute that command, and then I hit Shift and Return. And that locks down that cell. Okay, so just something to be aware of. Now, something else I'm going to do real quickly too, and you'll want to do this yourself often, if you want to create visualizations within the notebook, I'm going to type in matplotlib inline, and what this is is, it's a command that essentially allows for this notebook to list out the data visualizations inline right here on the screen on the notebook.
Now, I'm getting this message from Python saying it's going to take a moment to set some of the functionality up for matplotlib. That's okay, let's keep moving. We're going to connect to our data. And the way we do that is somewhat similar to what we saw when we were in R. So I'm going to create this variable name and I'm going to call it my exploratory data. And I'm going to assign that to the function pd.read_csv and then the path specifically to that data.
So here, today, Desktop/Exercise Files. Now let me go grab the name for that specifically, and I'm going to open up the exercise files, so we're looking at 02_03. And I'll paste that in, and then let's grab the name of our data, which is exploratory-py.csv. And I'll paste that in. And then Shift Return. Now, similar to how we did in R, we want to visualize what we have now.
So I could real quickly just grab this variable name, drop it right into this cell, hit Shift Return, and that's going to show us all of the data. So we can quickly see what we have to work with, and there are times where, as we're working through a process, you'll want to install some other package. Now one of the things to point out here is I'm going to go ahead and add a line in, and this is a good time to point out one nuance with Python is sometimes you'll see these line numbers change sequence.
And that's okay, it's just the way Python and the Jupyter environment work. So don't be too alarmed by that, if you see some numbers out of sequence. Now I know I'm going to need the Seaborn package to do some of my data modeling here, so I'm going to type in this command, which is pip install seaborn. And what this is going to do is it's going to go out and it's going to grab this Seaborn package and set it up so that we can use that here. So again, Shift and Enter.
And we can see that that set us up right there. Now I'm going to go ahead and enter in the dependency to bring that particular package in, so import seaborn as sns. And let me point out here, in terms of this as sns, you remember we talked about variables in the last video when we talked about R and creating these variables. We're basically doing the same thing here. This just means we can refer to Seaborn functionality as sns simply, as opposed to having to type out seaborn every time.
So we're going to import that package. And let's visualize our data and see what we're working with here. So I'm going to type in sns, again I'm referencing the Seaborn functionality. I'm going to do a KDE plot on this data. I am going to just come up here and grab my variable name for my data, I'm going to drop that in, and you can recall that with R, to look at a subset of data, we use the operator dollar sign.
Well, with Python we'll just use a dot. So we'll do .cpa, so we can see that subset of data here, and then hit Shift and Return. So that gives us a nice visualization of the shape of our data overall, so we can begin to get a sense of what we're working with. Again, we're in exploratory mode here, we're just trying to get a sense on what we have to work with. Now what I'd like to do is find out where most of the impressions and where most of the spend for those impressions is occurring.
I would like to leverage the data that we have to help to tell our client that piece of information. So again, we need to do a bit of a pivot table or transform our data a little bit to allow for us to create that specific visual. Now, there are times where we might want to see some additional detail in our data. So we have our distribution plot we can look at here, and we can use that with our Seaborn package. So I'm going to type in sns.distplot and go into copy in our variable name for our data, and look at the subset of cpa like we did before, and Shift Return.
Now this time around we can really get a little bit better clarity on where our spend is occurring. So in this visualization, it might have been hard to tell how much was between 10 and $15, and now we can see that there's really not any bend there. So again, just exploring right now. Now we want to pivot this data, because what I want to really do is I want to be able to find out where the most impressions and where the most spend is right now, so I can pivot this data. Now what I'm going to do here is I'm going to set up a new variable called MyETLData.
I'm going to assign that to myExploratoryData, and do a pivot command on it, and I'm going to pivot this data by keyword, by impressions, and by CPA, and I'll Shift Enter, and if we wanted to take a quick look at that pivot, all we would have to do would be, well, I'm just going to copy and paste that variable here just to make it easy, and then Shift Enter, and then that gives us a sense on how we pivoted that data.
So what we're now looking at are specific keywords, and we're looking at the overall cost and the number of impressions. So this is going to allow for us to create a heat map that has some very specific insights as a part of it. So let's visualize that data, and I'm going to do so again with our Seaborn package sns.heatmap, is the command, and then I'm going to again leverage this specific variable name and enter that. Now we have a data visualization that helps us to see really a few different things here.
So we can get a quick sense as to the overall cost, so we know that the darker the color is, the more that particular keyword costs. And the lighter it is, the less it costs. So we can see luggage bags for US travel is costing us the least, and we can also get a clear sense, too, on what the overall activity is or the impressions are. And that runs along this continuum. So anywhere from 4,905 impressions to 118,000, roughly, is what's in our data.
And everything is right here, in this one place, and it's easily sharable and editable. And with that, let's power down Python so we can move on to our first analysis using Tableau.
In this course, discover how to gain valuable insights from large data sets using specific languages and tools. Follow Chris DallaVilla as he walks through how to use R, Python, and Tableau to perform data modeling and assess performance. As Chris dives into these concepts, he shares specific case studies that come directly from his own work with clients. Plus, he shares three essential—and practical—best practices for data-driven marketing that you can use to bolster your organization's marketing performance.
- Installing R, Python, and Tableau
- Navigating the UI for R, Python, and Tableau
- Using R, Python, and Tableau
- Exploratory analysis
- Performing regression analysis
- Performing a cluster analysis
- Performing a conjoint assessment
- Stakeholder alignment