Join Barton Poulson for an in-depth discussion in this video Computing new variables, part of Learning R.
In the last movie, we looked at ways that you could use R to recode or transform individual variables to make them more suitable for your analyses. In this movie, we're going to look at ways that you can combine multiple variables into new composites, and how those procedures can work for your purposes. In this example, I'm actually going to be creating my variables in R. What I'm going to do down here with line 6 is I'm going to create a new variable called n1, which stands for normal number one, and I'm going to use the function rnorm, which means random normal.
So, it's going to be drawing values from the normal distribution, the bell curve, at random, and I'm going to get a million values. It's going to take about this long. Now I have a million random values. Let's get a histogram of those. There you see it's pretty much a perfect bell curve. It's symmetrical, it's unimodal, it's uniform; it's great. Then I'm going to do the procedure again and create another variable called n2. That's also normal distribution; a million values drawn at random. You can see in the Workspace I've got that one, and I'm going to get its histogram as well.
It's essentially identical. Again, it's a normal distribution, it's unimodal, and it's got the bell curve shape. Now what I'm going to do is I'm going to create a composite variable. This is the point here. I'm going to do it by simply adding each value from these different vectors. Now, this is the beautiful thing about R is that it's made for vectors, and so all I have to do is say that my new variable, which I'm calling n.add, in line 14, it gets n1 + n2, and R knows it's to take the first item in n1, and add it to the first item in n2, then go to the second item in n1, add it to the second item in n2.
So, I'm going to run that line 14, and you see I have a new thing in the Workspace. I'll get a histogram for that one. That's also a bell curve. The range is a little bit larger, because I'm adding instead of just averaging. Then I'm going to do one more thing; instead of adding them, I'm actually going to multiply them. So, I'm going to have n for normal.mult. And again, because we have these vector- based mathematics, I'll just say n1 * n2. First item in n1 multiplied times the first item in n2, and so on. I'll create that one.
It shows up in the Workspace, and I get the histogram. It's going to look a little different this time. The reason for that -- you see it's really high in the center, it drops down, and it goes all the way down to -10, and up to 10. The reason for that is, when you multiply values from two independent unit normal distributions, you actually get something that approximates what's called a Cauchy distribution. It's a very unusual distribution that has a tremendous number of outliers, and that's what I've got here. Now, the one statistic where the Cauchy is most distinctive is in kurtosis, which has to do with how peaked or pinched the distribution is, and is affected a lot by the presence of outliers.
In order to get kurtosis easily, I'm going to install the package psych. It installs it. In line 23, it loads it. From there, I can calculate the kurtosis for each of my four distributions. Now, for the normal distributions, I expect it to be close to zero. So, kurtosis for n1 is essentially 0, and also for n2, it's very close to 0. I'd expect it to be close to zero for the addition one, but for the multiplied one, I expected it to be a larger value.
In fact, that's nearly six. So, you can see the other one is very close to zero, and that the major difference in the fourth one where I multiplied is in the level of kurtosis. Anyhow, the idea here is that I've been able to take variables that I created here, and then combine them in different ways to create new variables. So, I have these ways of manipulating the data to get these composites, and that's something that you do, for instance, when you're creating an average score based on a survey of many different questions. R makes these vector-based operations very, very easy.
The operations used in this movie are just two options out of an essentially infinite variety for combining your individual variables into new composite variables for your analyses. R makes it very easy to find methods for your own work that can get your data into exactly the shape that you need. So, the speed, flexibility, and power of R are especially helpful as you manipulate data, and get ready for your own analyses.
The course continues with examples on how to create charts and plots, check statistical assumptions and the reliability of your data, look for data outliers, and use other data analysis tools. Finally, learn how to get charts and tables out of R and share your results with presentations and web pages.
- What is R?
- Installing R
- Creating bar character for categorical variables
- Building histograms
- Calculating frequencies and descriptives
- Computing new variables
- Creating scatterplots
- Comparing means