In the previous movie, we looked at how you can select cases to drill down and analyze your data by specific groups. In this video, we're going to look at how you can go from one group to another and compare the results in your own analyses. We're going to start by loading a data set that comes in r, it's called Iris. This is actually a well-known data set in the statistical world. It consists of measurements on several different species of irises. I'm going to load the data into the workspace. We have 150 observations of five variables.
And then let's actually take a look at what they're like. We see here, that we've got measurements on sepal length and width, petal length and width. And then, the species of the iris. Let's take a look at the mean petal width. Now, this is across all of the observations of all of the species. It's 1.199. Now, let's split the data file and repeat this analysis for each species using the aggregate command.
So we're going to use aggregate right here. Then I'm telling that I want to look at the petal width. That's my outcome variable. That's the one that I want the mean of. And then the tilde here means sort of as a function of. In this case it's going to be a function of the species variable in the iris data set. And then this last thing here at the end is I tell it exactly what it is, and I want to aggregate, and FUN stands for function here. And we're looking at the mean. So I want the mean of petal width for each of the species groups.
And then we'll run that one. And now you see that while the overall mean is 1.199, they differ dramatically from one species to another, going from 0.246 up to 2.026. So, there's substantial difference between the species, even though they're all irises. Now, let's say I want to look at several variables at once. What I can do here, is I can use the function cbind. That's for column bind. It's a way of combining the columns from several objects.
I'm still using aggregate. Except, now I'm using cbind to indicate that there are two outcome variables. I'm going to use petal width and petal length. And they are both done as a function of, that's the tilde, of iris species and the function that I want to see is the mean. So I'm going to run that one and now you see that I've got two variables. Now, it's labeling them as V1 and V2, variable one and variable two. So I need to remember the order that I asked them.
The first one is petal width. That's what I've got right here. The second one is petal length. And, here I look at the three species and you see, for instance, that the virginica is huge compared to the excetosa. And, that's a good way also, of being able to distinguish the species by measurements only. Which is why this particular data set shows up a lot in classification problems. Anyhow, that's the very basic principle we use to aggregate in order to split it up and to see the results of all of the groups at once. And you can do it for either variable at a time or by using cbind to get multiple variables at once.
Author
Released
9/26/2013- Installing R on your computer
- Using the built-in datasets
- Importing data
- Creating bar and pie charts for categorical variables
- Creating histograms and box plots for quantitative variables
- Calculating frequencies and descriptives
- Transforming variables
- Coding missing data
- Analyzing by subgroups
- Creating charts for associations
- Calculating correlations
- Creating charts and statistics for three or more variables
- Creating crosstabs for categorical variables
Skill Level Intermediate
Duration
Views
Q: The R files within Chapters 01 to 10 don't appear to have any code in them. Where is the final code for each file?
A: Look in the "final" folder for each video. These folders contains the final R code written by the author.
Related Courses
-
Introduction
-
Welcome58s
-
-
1. Getting Started
-
Using RStudio4m 36s
-
Installing and managing packages11m 16s
-
Using built-in datasets in R5m 27s
-
Entering data manually4m 37s
-
Importing data8m 53s
-
Working with color in R10m 18s
-
2. Charts for One Variable
-
Overlaying plots7m 25s
-
Saving images5m 34s
-
Solution: Layering plots2m 22s
-
3. Statistics for One Variable
-
Calculating frequencies3m 33s
-
Calculating descriptives5m 43s
-
-
4. Modifying Data
-
Examining outliers6m 42s
-
Transforming variables9m 26s
-
Coding missing data6m 4s
-
-
5. Working with the Data File
-
Selecting cases5m 30s
-
Analyzing by subgroup3m 14s
-
Merging files5m 16s
-
-
6. Charts for Associations
-
Creating scatter plots5m 2s
-
7. Statistics for Associations
-
Calculating correlation3m 54s
-
Comparing proportions3m 34s
-
-
8. Charts for Three or More Variables
-
Creating 3D scatter plots5m 13s
-
9. Statistics for Three or More Variables
-
Conducting a cluster analysis14m 14s
-
Conclusion
-
Next steps3m 40s
-
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.
CancelTake notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.
Share this video
Embed this video
Video: Analyzing by subgroup