From the course: Tableau and R for Analytics Projects

Add an R outlier detection model to a Tableau viz

From the course: Tableau and R for Analytics Projects

Start my 1-month free trial

Add an R outlier detection model to a Tableau viz

- [Instructor] Once your Tableau workbook is set up to display data that might contain outliers, you can add a calculated field that calls the sign2 function from the mvoutlier package to identify any outlier values in your data set. My sample file on Tableau is 04_04_Outliers, and that is a Tableau Packaged Workbook. You can find it in the chapter four folder of your exercise files collection. So we have the workbook setup in R, but what we need to do is to make sure that R service is running and also that we have installed the mvoutlier package that we need. So, I will switch over to R and in a brand new R Console type in the commands that I need. I'll start by ensuring that I have installed mvoutlier and so I'll type install.packages("mvoutlier") and press Enter. Then, I will identify the CRAN mirror site that I want to use. I'll use USA Oregon and click OK and R will make sure that the data is downloaded. Now, if this is the first time you've downloaded mvoutlier using a install.packages, this will be a much longer download and it is almost certainly still going on. So, if you need to, go ahead and pause the video and wait for everything to get installed and check that on your system. All right, I'm going to go ahead and continue. I will start by adding mvoutlier to the active library. So, I'll type library("mvoutlier") M-V-O-U-T-L-I-E-R. Yup, everything is good, Enter and it's been activated. Now, we need to make sure that our service is running. So, I'll type library("Rserve") capital R-S-E-R-V-E that's in double quotes and parenthesis. All right, there it is and then to run it I'll type Rserve() and then an empty set of parenthesis, left and right, nothing in between, Enter and Rserve.exe is now running. That means I can connect to R via Rserve from Tableau. Now, make sure that connection is in placed now. So, I'll press Alt + Tab to move back to Tableau and I'll go to the Help menu, point to Settings and Performance and then click Manage External Service Connection. So, I have Rserve, port 6311 should appear automatically and that the server name localhost is something that I typed in before. So, that looks good, I'll click Test Connection, successful, I'll click OK and OK again. And now, we can get to work creating our calculated field. To do that, I will go up to the Analysis menu and almost all the way down in the menu I'll click Create Calculated Field. In the name box, I'll type Outliers and then in the definition box, we're going to type a fairly lengthy IF and statement that uses the sign2 function from the mvoutlier CRAN package to identify a value is either an outlier or not. If it is an outlier, it will assign it the value of zero. If it isn't, it will assign it a real value, a decimal from zero to one, but not including zero. All right, so with that in mind, let's go ahead and start creating our calculated field. This is an IF statement, so we'll type IF and we're expecting a real number, so SCRIPT_REAL and then I'll press Return to get myself some space to work. We need to make sure that mvoutlier is in the active library. So, in "library(mvoutlier); then I'll space in a bit, sign2 which is the name of the function we want to use. We used it in a previous movie. Then cbind, C-B-I-N-D in which we'll combine multiple columns together into a single data frame, then left parenthesis and then .arg1 which is a short for argument 1. This will take the name of the measure that we pass into it from a Tableau field. All right, another right parenthesis, I actually need to do two here. There is one at the end that will get to in a moment and that's what that extra one is to the right. Now, we need to identify the column of the output that will contain the indicator of outlier or not and that is after dollar sign, wfinal01. And this is something that you need look up in the documentation, but you have it here. Then, a double quote and a comma and I'll do Enter again and space in. Now, we need to identify the value that is going to be used to identify whether value is an outlier or not and that is the sum of the value in the field, profit percentage. So, that's the second field there and it's in square brackets, everything looks good. Then I can finally go to the right of that last right parenthesis and type = 0. All right, so that's the test. If it is, THEN we assign it the value of outlier just the text value outlier. ELSE we assign it the value of normal and you can see that the text values are contained within quotes. We're at the end of our IF and statement, so I will type END and the calculation is valid, good. All right, I'll go ahead and click OK. And now, we can identify outliers by dragging Outliers from the Measures area to the Color icon on the Marks card and it will take R in Tableau a moment to do the calculations and we can see that we have a number of outliers. We have a bunch that are looks like those were negative profit percentages, so those were loses that's not surprising. And it also looks like we have a very nice bunch between about 15, 14.5% and say 26% or so. We also have one outlier that was below the main group and we have one that was above. So, what this tells us is that most of our sales are in fact, in our expected range within plus or minus three standard deviations of our mean profit percentage. We have a number of losses that we can look into and we had one sale that was slightly above our expected range. So, that's not necessarily a bad thing, but we also want to make sure that we aren't charging some customers too much and losing money on other sales.

Contents