Learn how to perform analysis for prediction using R and how to interpret the results.
- [Instructor] So we have our RStudio open and I have already connected to the data. For reference, we are using the exercise files in the directory 04_02, now let's go ahead and run that line so we can bring our data in. And I want to open up our data for a moment so I'm just going to double click. I want to point out that we had this sales classification column and what I know about this column, just to reveal what we're working with here is that our data has been encoded.
So in other words, during the data preparation process, this data was categorized into these three buckets, A, B, and C, A is the highest performer, B is in the middle and C is the lowest performer. So let's go ahead and close out of our data. Let me see our code again, and I want to see how many of our client's stores are operating at these different levels. So I'm going to bring up a synopsis of that data using the table command so we'll type that like this, I'll type table, I'll then apply our variable for our data frame, my prediction data.
And we're going to look at a sub classification of that data, columns specifically of the sales classification. And I'll run that and so we can see here now in the output in the console that we have 85 stores operating in the A group, we have 138 operating in the B group, and 77 operating in that C group. Now, it's going to be nice to have a handy reference to our column names because we're going to use those in a moment.
So, let's type this, come back up to our code window. And I'm going to type the command names and then again feedback, the variable name for our data frame and run that, so now we have a quick reference for what our different columns are called. Before we can use them though, we're going to need to load our decision tree algorithm. And I'll do that like this, there are a couple of steps.
The first is, we need to install the package. So, I'm going to do install packages and the name of our package for the algorithm is tree. And I'll run that line and that'll take just a moment. You can see in the console that it is downloading that specific package. And now that we have the package there, we want to bring it into our library. So we'll do it this way, library and we'll feed that, or assign that the tree package that we just downloaded and we'll run that.
So, now that we have our algorithm installed, I can feed it some data, that code is already provided in your exercise files. Let's walk through it a little bit. First, we've established a variable name of myDecisionTree, next we assign the tree function. Which itself, takes the output we are predicting. In this case, that's sales classification, that's the first input and then that's followed by the predictors, and those are some of the column names that we looked at earlier that you see right here in the console.
So, if we look at the predictors in this specific configuration, you can see we have capita + drive.by.traffic + complimentary.establishments and a few others as well. And ultimately, what we're doing, is we're assigning those to the my prediction data. So let's run that and then plot my tree so we can see what the algorithm found. Okay, so we just ran the algorithm, we just can't see the output yet. And we're going to do that with our plot command and pass in the value of myDecisionTree.
So let me scroll down and we'll put this right under the comment for plotting our tree. And we'll insert the command plot and we'll pass myDecisionTree, run that line. So, our tree algorithm generated what is called a dendogram, you can see that in the plots view on the bottom right. But it's not terribly helpful, because we need to label it. So we'll do that with our text command and pass it, the value of myDecisionTree.
So we'll do that right here, text command and again myDecisionTree and this will assign our labels to that dendogram. And that's still not terribly helpful. It's information overload, so what we can do is what is called pruning our tree. So basically this tells the algorithm, hey, just identify the biggest predictors and show me how those predictors influence the outcome. So, let's prune our tree, we'll establish a variable called myPruneTree and then we'll use this function prune.tree and pass it myDecisionTree.
So, what does that look like? We'll first establish that variable myPrunedTree and then assign it prune.tree for the function and then we'll pass in myDecisionTree and in this case, I know that three is the ideal number of branches for this tree because of my exploratory analysis. So we'll plot our pruned tree to that. And the way I do that is to assign that best assignment and I'm going to put the number three there.
So let's run this, and then again, what we've just done is we've run the algorithm to see the output. We have to plot our myPrunedTree. And that looks like it might provide a little bit more insight and let's go ahead and label it. So, I just typed in the command text and then passed it myPrunedTree as a variable to label that dendogram.
So we can see here that out of our nine different variables that we fed our algorithm, that it narrowed things down to three. Now, this tells us that in order to achieve an A level sales classification, primarily we need to ensure that var three, which is one of those predictors in our data, needs to be less than 3.5. That the complimentary establishment need to be less than 7.5 and the unemployment rate should be under this stated amount.
In this course, discover how to gain valuable insights from large data sets using specific languages and tools. Follow Chris DallaVilla as he walks through how to use R, Python, and Tableau to perform data modeling and assess performance. As Chris dives into these concepts, he shares specific case studies that come directly from his own work with clients. Plus, he shares three essential—and practical—best practices for data-driven marketing that you can use to bolster your organization's marketing performance.
- Installing R, Python, and Tableau
- Navigating the UI for R, Python, and Tableau
- Using R, Python, and Tableau
- Exploratory analysis
- Performing regression analysis
- Performing a cluster analysis
- Performing a conjoint assessment
- Stakeholder alignment