# Creating scatterplots by group

## Video: Creating scatterplots by group

In the last pair of movies we've looked at the variations on the bar chart that let you use two categorical variables to predict scores on a third categorical variable or on a scale variable. In this movie, we'll change the balance a little by looking at a chart for times when you have two scaled variables and one category. This calls for a simple variation on the scatter plot that we covered in this section on bivariate graphs. The only big difference is that we'll be adding group markers for the categorical variable. In this example, I'll use the Google Search's information from Searches.sav.

In the last pair of movies we've looked at the variations on the bar chart that let you use two categorical variables to predict scores on a third categorical variable or on a scale variable. In this movie, we'll change the balance a little by looking at a chart for times when you have two scaled variables and one category. This calls for a simple variation on the scatter plot that we covered in this section on bivariate graphs. The only big difference is that we'll be adding group markers for the categorical variable. In this example, I'll use the Google Search's information from Searches.sav.

To get this, I need to go over to Graphs to Chart Builder. From there, I go down to the bottom- left on gallery and I go to Scatter. Now I wanted to use the second one on the top, which is called a Group Scatter, and I drag that out to the canvas. From there, I need to get my predictor variable. Let's scale my predictor variable that's the category and my outcome variable that's a scaled variable. For this example, I am going to use interest in the NBA as a search term. So I am going to come over here and get NBA as a Google search term. I am going to drag that over to the Y-axis.

Then I am going to use two predictors. One is I am going to use the median age of people who live in the state. That median age. That's a scaled variable. so I am going to put it in the X-axis and then it makes sense to me that interest in the NBA would be related to whether a state has an NBA team. So I am going to get has NBA that as a 01 indicator variable and drag that over to set color. Finally, as I scatter plot, you can sometimes find unusual points and you want to see who they are.

So I am going to come down to the tab for groups and points ID. There I am going to click on point ID label at the bottom. Back on the canvas is add the box for point label variable, and I am going to use this state code. So I'll just drag that over and now I am ready to go. Press OK and I get a slightly complicated chart because of all the data names. I am going to edit those out for a moment, but because I've used a variable, I'll be able to bring some of them back if I want. So I'll double click on it.

I can just select the names and I hit Delete for right now. So what I have is a bunch of blue circles and a bunch of green circles. The blue circles are for states that do not have NBA teams. The green circles are for states that do. To make these little bit easier I am going to modify them and make them solid. I'll just click on one. It looks like I better click again to get just the green ones and click on Fill and make that the same shade, green, and that actually has an effect of making all of them solid.

Now what I can do is I can click on regression lines. up on the menu bar here at the second option is called Add fit line at subgroups, as for regression line, separately for each group. I can click on that and I get two lines. One in green for the states that do have NBA teams and one in blue for the states that don't. I also see that we have an outlier and what I am going to do is I am going to come over to the left of this bar to the little target thing. There's the data label mode.

I can click on that and now because earlier I said that I was going to use the state abbreviations as data labels, I can come right down here, click on this outlier, and I can see that it's Utah. Now there's something then. The Utah Jazz seems to elicit unusual levels of fan support. Also people in Utah tend to be rather young on average. I am going to close this chart because I am done editing it, and now I can see that there is an association between age and whether a state has an NBA team that can predict their level of interest in NBA as a search term.

Just as we saw with bivariate graphs, scatter plots are great way to show the relationship between two scaled variables, and then by simply changing the markers, you can add a third categorical variable and you can even see how that new variable changed the relationship between the other two.

1. ### Introduction

2m 58s
1. Welcome
1m 5s
2. Using the exercise files
40s
3. Using a different version of the software
1m 13s
2. ### 1. Getting Started

19m 0s
1. Taking a first look at the interface
11m 49s
2. Reading data from a spreadsheet
7m 11s
3. ### 2. Charts for One Variable

21m 54s
1. Creating bar charts for categorical variables
7m 18s
2. Creating pie charts for categorical variables
2m 54s
3. Creating histograms for quantitative variables
5m 45s
4. Creating box plots for quantitative variables
5m 57s
4. ### 3. Modifying Data

33m 10s
1. Recoding variables
5m 33s
2. Recoding with visual binning
5m 33s
3. Recoding by ranking cases
5m 26s
4. Computing new variables
5m 37s
5. Combining or excluding outliers
5m 21s
6. Transforming outliers
5m 40s
5. ### 4. Working with the Data File

28m 12s
1. Selecting cases
6m 44s
2. Using the Split File command
5m 12s
3. Merging files
5m 33s
4. Using the Multiple Response command
10m 43s
6. ### 5. Descriptive Statistics for One Variable

22m 14s
1. Calculating frequencies
8m 43s
2. Calculating descriptives
5m 31s
3. Using the Explore command
8m 0s
7. ### 6. Inferential Statistics for One Variable

16m 3s
1. Calculating inferential statistics for a single proportion
6m 6s
2. Calculating inferential statistics for a single mean
5m 39s
3. Calculating inferential statistics for a single categorical variable
4m 18s
8. ### 7. Charts for Two Variables

30m 43s
1. Creating clustered bar charts
7m 10s
2. Creating scatterplots
5m 8s
3. Creating time series
3m 24s
4. Creating simple bar charts of group means
4m 17s
5. Creating population pyramids
3m 0s
6. Creating simple boxplots for groups
3m 3s
7. Creating side-by-side boxplots
4m 41s
9. ### 8. Descriptive and Inferential Statistics for Two Variables

45m 28s
1. Calculating correlations
8m 17s
2. Computing a bivariate regression
6m 27s
3. Creating crosstabs for categorical variables
6m 34s
4. Comparing means with the Means procedure
6m 33s
5. Comparing means with the t-test
6m 4s
6. Comparing means with a one-way ANOVA
6m 30s
7. Comparing paired means
5m 3s
10. ### 9. Charts for Three or More Variables

24m 30s
1. Creating clustered bar charts for frequencies
6m 34s
2. Creating clustered bar charts for means
3m 45s
3. Creating scatterplots by group
4m 13s
4. Creating 3-D scatterplots
4m 25s
5. Creating scatterplot matrices
5m 33s
11. ### 10. Descriptive Statistics for Three or More Variables

30m 57s
1. Using Automatic Linear Models
11m 52s
2. Calculating multiple regression
9m 3s
3. Comparing means with a two-factor ANOVA
10m 2s
12. ### 11. Formatting and Exporting Tables and Charts

29m 29s
1. Formatting descriptive statistics
6m 1s
2. Formatting correlations
7m 49s
3. Formatting regression
10m 19s
4. Exporting charts and tables
5m 20s
13. ### Conclusion

51s
1. What's next
51s

