Learn how to perform a basic exploratory analysis in R.
- [Instructor] Let's think about a case study one more time. Remember the things that we're trying to do. We're trying to determine the general shape of our data. Were trying to determine the patterns that exist within our data and we're trying to create a compelling visual that can really help to tell a story. In this video, I'll show you how to create a heat map, which is a great way to visualize a lot of data to see which of your marketing campaigns are working well, and which are lagging behind. So let's get started. Let's first open up RStudio.
Now one of the first things that we want to do is go ahead and open up one of our exercise files. So I'm going to go do the desktop, open up our exercise files and I'm going to grab this exploratory r file. I'm just going to drop that right on top of our icon. And now we have this file that has been opened up for us. What you're looking at here are a few different comments within an R file that are going to help to provide the framework for what we're getting ready to work through.
So this hashtag denotes a comment within an R file. A good first step is to connect to our data. So let's type in some code. I'm going to create a variable. A variable is a container for some information. And I'm going to call this particular variable that is going to house our data myExploratoryData. I'm then going to this operator, which is an R convention. So it's a less than sign or the open caret and then a quick dash and ultimately, it just looks like a pointer or an arrow, if you will.
Now I'm going to type in the function read.csv and I'm going to point this at our data file. I'll just go ahead and copy that over. Desktop, Exercise Files and then the name of that particular csv. I'm just going to copy that, which is exploratory-r.csv. And now that I've typed this in, I'm going to click the Run button.
So, I've selected this line of code and I've clicked now Run. And you can see here in our Environment window that we've now connected to that data. So if you'd like a visual confirmation that we've connected to that data, all you got to do is click on myExploratoryData and you can see that brings up a view of our data itself. It's nice to get that visual confirmation that we've connected to data we anticipated that we were connecting to. I'm going to close out of that and come back to our code though. Now keep in mind, at this stage we're in exploratory land.
We're trying to find overall shape of the data, the meaning in the data and a lot of what can be useful is just helping to view that data from some different perspectives. Now, one of the things I can do is I can run a head command that looks something like this. I type in head and then I grab the variable name for our data and I drop that in here. And if I run this, that essentially helps us to see the first six lines of our dataset. So we can take a quick snapshot of our data, spend a few minutes looking at what we have there, and it's just one way to summarize all the data that we have in this file.
It can also be very helpful to visualize the data. And so one of the ways we can do that is with a histogram. The way we get a histogram, I'm going to type in H I S T and then going to paste in our variable name for our data and then we want to look at a specific subset, a specific column, we want to look at the shape of the visual of all of our data within one column. So, again if I come over and I double-click on myExploratoryData, I can see my different column names are CPA, CVR, CTR.
So those stand for cost per acquisition, conversion rate, click through rate, those are some examples there. I'll go ahead and close out of this and the way I'm going to look at a subset of data is I'm going to use the dollar sign and then right here, we can see all the sub-data that's available to us. I'll just click CPA, run that. And now we can see a visualization of that data, specifically our cost per acquisition. We can see the range. So it runs from somewhere around $0 dollars to somewhere around $10 dollars and that helps us to really just have a visual there of that data.
Now, ultimately what we're working to establish here is a data visualization that we can read. And one of the things that we'll want to do is we'll want to shift the names of the dimensions for the data, that thing that we are measuring, right. So come back and look at our data and you can see that what we're measuring is the performance against these keywords. So it will be nice if we could go ahead and align these names with these numbers. So when we see our visual, we're actually able to correspond the data visualization to these names, versus just a number.
So, I'm going to do a little bit of transformation process on our data here. And the way I'm going to do that is I'm going to type row.names and then go in to paste in our variable name for our data and then go in to assign the sub-data keyword to that row name. And I'm going to go ahead an run this. And then real quickly, we can take a look at this too. And we can see now instead of numbers, we actually have the names themselves.
And that's going to help once we get our visualization loaded. So if we wanted to review that transformation I just did it by double-clicking on the data itself, but again we can go back to this idea of our head command and we can run this again and you'll now see that this looks a little bit different that it looked earlier, because we now have instead of just these numbers, now I actually have labels for each of those. Now, one of the last transformations on our data here, to get us to our heat map, is we need to evolve our data into a matrix, versus a dataframe.
And let me introduce that idea of a dataframe real quickly. With all the data we bring in to these variables in R. R calls them a dataframe. There are times where we want to evolve that sort of data type into something else to be able to work with it. And this is one of those instances. So I'm going to type in my data matrix because we need a matrix of data specifically for a heat map and then going to type in command data.matrix and I'm going to paste in again our variable and I'm going to run this and you can now see that in addition to having our dataframe here, we now have a data matrix too.
So at this point, we're only one step away from generating our heat map. So what we're going to do is we're going to type in the command heatmap. We're then going to assign that data matrix that we just created as the input to the heat map, so myDataMatrix and I'm going to do a couple of things just to insure that the visualization is as redable as possible. We won't go into detail about these specific commands at this stage of the game, but just rest assured the reason that we do this is just to clean up the redability of what the heat map is going to provide for us.
And now if I run this, you can see that we've generated our heat map. And so what we can do with our heat map is get a quick sense as to the performance of each of these different columns. So in other words, we can look at CPA and we can see that down on the far bottom of that column we have a stark red color, so we have a low price air keyword that has a very high cost per action. And then we might look up a little bit further on that same column and we identify a color that's white and we have a low price tickets airline keyword.
And so we can see just at a glance that that particular keyword is going to be very low. So one of the great things about a visualization like this is that it's packed with insights. If we sit with this information and we begin to pick out the areas where we feel like we should be able to get a lower cost per action or get a higher click through rate, if we can identify what those areas can look like from an additional assessment perspective, we can identify some potential action items just by doing this exploratory analysis.
So there you have it. In just three lines of code we have developed a heat map in R. We can use this exploratory visual device to identify some trail hits for deeper analysis. Or better yet, for actionable insights. Congratulations, you did a great job. Next we'll move on to a similar exercise in Python. So for now, let's go ahead and close out of RStudio.
In this course, discover how to gain valuable insights from large data sets using specific languages and tools. Follow Chris DallaVilla as he walks through how to use R, Python, and Tableau to perform data modeling and assess performance. As Chris dives into these concepts, he shares specific case studies that come directly from his own work with clients. Plus, he shares three essential—and practical—best practices for data-driven marketing that you can use to bolster your organization's marketing performance.
- Installing R, Python, and Tableau
- Navigating the UI for R, Python, and Tableau
- Using R, Python, and Tableau
- Exploratory analysis
- Performing regression analysis
- Performing a cluster analysis
- Performing a conjoint assessment
- Stakeholder alignment