Learn about how to perform a cluster analysis using R and how to interpret the results.
- [Instructor] Our customer has millions of email records, and has been contextualizing that data with customer preference and customer activity. So we want to establish as set of segmentations and use that information to create campaigns that are personalized to different groups. We're going to do that using cluster analysis using R. So we have our R environment up and let's go ahead and connect to our data. So we have that line in there already. Let's just select that line and click Run. And let's have a look at what we're working with.
So I'll type in the head command and then I'm going to pass that our variable name. So we'll do that right here. And then our variable name, myClusterData. And run that so we can see the printout of that in our console here. So we can see that we have five columns, or what are known as five vectors, to work with and I want to point out that these email addresses that we have in the data have been encrypted. So these are not actual email addresses of customers that can be used for any purpose other than our exercise files.
So in addition to our encrypted email addresses, we have what's known as an ordinal scale for some part of our customer behavior. This could be how many times a customer purchased or potentially how many times a customer tweeted on Twitter. It could be a number of different things. An ordinal scale is essentially a range that your metadata would clarify the meaning of. So on a similar point, we have brand preference, which stands for which particular brand this customer shows the most affinity towards in our client's product portfolio.
And we have a data point for CTA. Now this is the specific call to action that this customer responded to. And then we also have some demographic information for customer age. So that's an overview of what we're working with. So what we want to do now is standardize our data. This is a best practice for this sort of machine learning algorithm. What it does is it transforms the data to give it equal weight. We're going to apply the scale function to accomplish this. So I'm going to declare a variable name and assign that variable name the result of the scale function.
So that's going to look something like this. Variable name, let's call it myClusterDataStandardized. And then we're going to assign that the value of the scale function, which we're going to feed the data frame, myClusterData. And we're going to essentially remove the first column of that data. Because our first column, in this case, for what we're doing, our analysis to address really needs a look at specifically numeric data.
So I'm going to run this line. So what we have just done is we assigned a variable name of myClusterDataStandardized and we have assigned that scale function to the value of our data frame. Then we dropped the first row because our cluster analysis is only looking for numerical data. So we're going to use the popular kmeans clustering algorithm to now do the heavy lifting and create our groups here. What this algorithm does is it establishes the mean value for each number of groups. In this case, we tell it to sort by a certain value and that mean value becomes what is known as a centroid.
A centroid is an average that the data will be grouped around. So we're going to declare a variable called ourGroups and run the function kmeans and feed it our standardized data. So that looks like this. So again we're going to call this ourGroups for the variable name, assign it the kmeans algorithm, and I'm just going to copy and paste our standardized data. And let's tell it to group our data by three different groups.
Run this. Next we'll need to activate some additional functionality for R to visualize our clusters. So let's load in our cluster library. Run that. And now we'll run the function to visualize our clusters. And so that's going to look something like this. We're going to run the command clusplot. We're also going to assign our standardized data that we created up above.
And we're going to feed in our value for the kmeans algorithm that we applied to ourGroups, so type in ourGroups. And then we're going to apply this cluster functionality. And I'll run that. So what we've done here is we called the function and our standardized data, and the output from the kmeans algorithm which provides us with a visual. Now if we want to see the value of each group, we can type in something like this. We can say ourGroups.
Which again, is the value of our kmeans algorithm. And then size as the command, and I can run this. So you can see what number of values or what number of data points consist for each of those groups that we just created. So what you can do is you can assign these three customer cohorts to a segment and create an email campaign specific to each group, with target content.
In this course, discover how to gain valuable insights from large data sets using specific languages and tools. Follow Chris DallaVilla as he walks through how to use R, Python, and Tableau to perform data modeling and assess performance. As Chris dives into these concepts, he shares specific case studies that come directly from his own work with clients. Plus, he shares three essential—and practical—best practices for data-driven marketing that you can use to bolster your organization's marketing performance.
- Installing R, Python, and Tableau
- Navigating the UI for R, Python, and Tableau
- Using R, Python, and Tableau
- Exploratory analysis
- Performing regression analysis
- Performing a cluster analysis
- Performing a conjoint assessment
- Stakeholder alignment