Learn how to perform a conjoint assessment using Python and how to interpret the results.
- [Instructor] One of the most challenging aspects of running an analysis like the one we're discussing is the design of the survey at the outset. Now, like we saw in the last video, our different combination of attributes and levels created the potential for 486 possible combinations. I don't know too many customers who would rank that many possibilities, let alone even as many as, say, 40. Now, let's go ahead and load in our packages. So first cell, Shift Enter, and I'm using our exercise files for our case study data, so let's go ahead and connect to our data set.
And let's do a quick snapshot of what we're working with here, so we'll just type in the variable that we just assigned to our data frame, myConjointData, and I'll run that. And we can see what we're working with here. Now this may seem like a small data set, but in all reality, there are over 400 consumer responses here, because I aggregated those response rates during my ETL process to prepare the data. Our rank column shows how each of our 11 combinations, in this case, scored.
So in other words, this survey study narrowed our 486 potential combinations down to just 11. Our column names are a little bit cryptic, so we're going to do a little bit of data munching here to clarify what those are. And I have my metadata file, so I can add in names that are more descriptive here, so we've done that right here. And basically what we did is we declared a hash table with our descriptive names. And next we need to apply those names, so I will do that by assigning our data frame, myConjointData, and running the rename command, and we're going to assign that the names we just declared.
And we're going to run this inplace operator, which in essence just says hey, replace the dataframe that we already have established. Then we're going to just run a quick confirmation that this is working the way that we intended, so I'll just print out the first row, so myConjointData.head, and in the first row. So I'm going to go ahead and run that, and so that looks good. We have a statement here that assigns each of those columns with the exception of rank to a variable X, which will represent our X axis in just a moment.
So again, we have a variable name called X, we've assigned that our dataframe, and we've now gone ahead and specifically declared which columns of our data we want to belong to this value of X. Now we want to assign a constant to this data to provide our algorithm with a zero-based reference point, or a benchmark, in other words. So I do that this way. I'm going to define X, this function of SM, which we added in our packages, and now I'm going to add a constant specifically to our dataframe that we defined above as X.
And then we're going to do the same for the Y and assign our rank, at this point, to the Y. So we're going to do y = myContjointData.rank. And now I'm going to generate a linear regression model, which really brings us full circle for the course, and we'll fit those values, and so ultimately this is going to produce a multiple regression. So in other words, when we first looked at regression earlier in the course, we plotted one independent variable, but now we're going to plot many, and I'll do that this way.
So I'm going to first assign a variable, and we'll call it myLinearRegressionForConjoint, long variable name, but that should do the trick. And then, again, we're going to call this SM function from our package above, ordinarily squares, which you can recall from earlier on in the video, when we first looked at regression, and we're going to apply the Y and the X values, and now we're going to pin that to our fit command.
So all of this should be a little bit of a refresher from those earlier videos, and lastly, we want to go ahead and run the summary of that so we can see the output from our regression. Again, I'm going to type in myLinearRegressionForConjoint.summary, and now we're going to go ahead and run this full block of code. So we received a lot of output. The first output was an error message, so let's read that. This says that this specific function is looking for a value of something greater than 20, or equal to or greater than 20.
Again, what we know at this stage of the game, we're using N as representative of 12, that's how many data points we have, but I know this is aggregate data, so we're just going to wave our hands at that statement and just move on, then. But what we'll focus on for analysis is our coefficients. This is one way we can go about establishing the relative utility, like we saw in the visual from our last video. The higher the coefficient, the higher the relative utility. So of our three different attributes in our seven different levels, if we do a rank order, just by looking at our coef column, right here, that special sauce number three, so this venerable secret sauce for our social media startup, ranks highest, so we can see that at a 3.6.
And the Ux1 ranks next in line at a 3.05. And looks like next up is our photo feature one, or PhotoF1. So what I'd like to do is to summarize my findings here in a quick visual. So we need to normalize this data to allow for us to create a pie chart. We've got a quick formula loaded in here, and we're just going to go ahead and fill in those values, so I'm just going to assign the respective coefficient values that we just identified.
So that was 3.67, 3.05, and 2.72. And let's go ahead and run that. And that gives us our values there. And then I'm not going to go into much detail for this last block of code, but essentially, it's taken our input to create a pie chart. So we have assigned the different labels, the sizes we just got back from the normalization of the data, we're also assigning some color and some layout parameters, and then plotting our graph with a little plotting magic, so let's run that.
And then we run that and now we have a visual that could represent the next breakthrough for social media.
In this course, discover how to gain valuable insights from large data sets using specific languages and tools. Follow Chris DallaVilla as he walks through how to use R, Python, and Tableau to perform data modeling and assess performance. As Chris dives into these concepts, he shares specific case studies that come directly from his own work with clients. Plus, he shares three essential—and practical—best practices for data-driven marketing that you can use to bolster your organization's marketing performance.
- Installing R, Python, and Tableau
- Navigating the UI for R, Python, and Tableau
- Using R, Python, and Tableau
- Exploratory analysis
- Performing regression analysis
- Performing a cluster analysis
- Performing a conjoint assessment
- Stakeholder alignment