Start learning with our library of video tutorials taught by experts. Get started

SPSS Statistics Essential Training

Combining or excluding outliers


From:

SPSS Statistics Essential Training

with Barton Poulson

Video: Combining or excluding outliers

When you start looking at your data one of the problems you might have to deal with is outliers. These are extreme scores, like somebody who is 7 feet tall or somebody who has 26 children or unusual categories, like being Nepali or a Latin Poetry Major. Now sometimes these unusual scores or categories are inherently interesting, like with world records or gifted and talented programs in schools. In other situations, however, they can wreak havoc with statistical procedures that might be designed to look at general patterns, or overall trends.
Expand all | Collapse all
  1. 2m 58s
    1. Welcome
      1m 5s
    2. Using the exercise files
      40s
    3. Using a different version of the software
      1m 13s
  2. 19m 0s
    1. Taking a first look at the interface
      11m 49s
    2. Reading data from a spreadsheet
      7m 11s
  3. 21m 54s
    1. Creating bar charts for categorical variables
      7m 18s
    2. Creating pie charts for categorical variables
      2m 54s
    3. Creating histograms for quantitative variables
      5m 45s
    4. Creating box plots for quantitative variables
      5m 57s
  4. 33m 10s
    1. Recoding variables
      5m 33s
    2. Recoding with visual binning
      5m 33s
    3. Recoding by ranking cases
      5m 26s
    4. Computing new variables
      5m 37s
    5. Combining or excluding outliers
      5m 21s
    6. Transforming outliers
      5m 40s
  5. 28m 12s
    1. Selecting cases
      6m 44s
    2. Using the Split File command
      5m 12s
    3. Merging files
      5m 33s
    4. Using the Multiple Response command
      10m 43s
  6. 22m 14s
    1. Calculating frequencies
      8m 43s
    2. Calculating descriptives
      5m 31s
    3. Using the Explore command
      8m 0s
  7. 16m 3s
    1. Calculating inferential statistics for a single proportion
      6m 6s
    2. Calculating inferential statistics for a single mean
      5m 39s
    3. Calculating inferential statistics for a single categorical variable
      4m 18s
  8. 30m 43s
    1. Creating clustered bar charts
      7m 10s
    2. Creating scatterplots
      5m 8s
    3. Creating time series
      3m 24s
    4. Creating simple bar charts of group means
      4m 17s
    5. Creating population pyramids
      3m 0s
    6. Creating simple boxplots for groups
      3m 3s
    7. Creating side-by-side boxplots
      4m 41s
  9. 45m 28s
    1. Calculating correlations
      8m 17s
    2. Computing a bivariate regression
      6m 27s
    3. Creating crosstabs for categorical variables
      6m 34s
    4. Comparing means with the Means procedure
      6m 33s
    5. Comparing means with the t-test
      6m 4s
    6. Comparing means with a one-way ANOVA
      6m 30s
    7. Comparing paired means
      5m 3s
  10. 24m 30s
    1. Creating clustered bar charts for frequencies
      6m 34s
    2. Creating clustered bar charts for means
      3m 45s
    3. Creating scatterplots by group
      4m 13s
    4. Creating 3-D scatterplots
      4m 25s
    5. Creating scatterplot matrices
      5m 33s
  11. 30m 57s
    1. Using Automatic Linear Models
      11m 52s
    2. Calculating multiple regression
      9m 3s
    3. Comparing means with a two-factor ANOVA
      10m 2s
  12. 29m 29s
    1. Formatting descriptive statistics
      6m 1s
    2. Formatting correlations
      7m 49s
    3. Formatting regression
      10m 19s
    4. Exporting charts and tables
      5m 20s
  13. 51s
    1. What's next
      51s

Watch this entire course now—plus get access to every course in the library. Each course includes high-quality videos taught by expert instructors.

Become a member
Please wait...
SPSS Statistics Essential Training
5h 5m Beginner Aug 17, 2011

Viewers: in countries Watching now:

In this course, author Barton Poulson takes a practical, visual, and non-mathematical approach to the basics of statistical concepts and data analysis in SPSS, the statistical package for business, government, research, and academic organization. From importing spreadsheets to creating regression models to exporting presentation graphics, this course covers all the basics, with an emphasis on clarity, interpretation, communicability, and application.

Topics include:
  • Importing and entering data
  • Creating descriptive charts
  • Modifying and selecting cases
  • Calculating descriptive and inferential statistics
  • Modeling associations with correlations, contingency tables, and multiple regression
  • Formatting and exporting tables and charts
Subjects:
Business Data Analysis
Software:
SPSS
Author:
Barton Poulson

Combining or excluding outliers

When you start looking at your data one of the problems you might have to deal with is outliers. These are extreme scores, like somebody who is 7 feet tall or somebody who has 26 children or unusual categories, like being Nepali or a Latin Poetry Major. Now sometimes these unusual scores or categories are inherently interesting, like with world records or gifted and talented programs in schools. In other situations, however, they can wreak havoc with statistical procedures that might be designed to look at general patterns, or overall trends.

In the latter case, where you may be interested more in common scores than in uncommon scores, you have a few choices on how to deal responsibly with the outliers. Now the first question is how to define outliers. Now we've already looked at one way of getting a graphical definition of outliers on a scale variable, and it's with a box plot. I am going to come up to Graphs, to Chart Builder, to Boxplot. I will drag in the 1D Boxplot, and let's look at Market Capitalization.

Also, because we have convenient stock symbols over here, I am going to ask for a Point ID so I know who the outliers are. I will just drag that over here and press OK, and what we see is that the variable for Market Capitalization is extraordinarily skewed, and in fact they often call this pathological skewed. We have Apple here with over $300 billion in market capitalization, Microsoft, Oracle, and Google, and it just goes down. And we have this huge number of companies that are stuck in a tiny level of market capitalization relatively speaking.

In fact, we have no idea what the median or the mean is because those other scores all get squished together so much that there is 2800 companies in the NASDAQ listing, but we have these extreme outliers that are squishing all the others, that is not possible to really see what's going on. So we know that we have outliers here on a scale variable. Now on a categorical variable, like for instance ethnicity, what you then have as a definition for categorical outliers is that any group that has, for instance, less than 10% of the overall sample would be considered a categorical outlier.

In that situation you have the choice of combining them with other categories and creating a sort of Other category except that it has to be very heterogeneous group. That or you simply don't analyze by that variable in the future. But let's talk about what to do with a scale variable. Now if you don't have very many outliers, or that they're not very far away, you can leave them in. You could take them as legitimate values and you could proceed with that understanding, as long as you communicate it adequately with others.

On the other hand, another choice is to exclude them. Now I don't necessarily mean delete them permanently from the data set, but you can create a selector. We've done this before. I should just mention right here, this is $100 billion, and we still have a huge number of companies right there. I am going to select a much smaller number. I am going to go to $100 million capitalization. So I am going to go to Data, to Select Cases. Select Cases if your market capitalization is less than 100 million and press Continue.

Now I have the option of just filtering them out. That creates a new variable that temporarily excludes or deleting them permanently, and I don't want to do that. I am just going to filter them out right now. So I am going to press OK, and it tells me that it has done that selection. And in fact, if I go back to the data set I will see that these cases got, for instance, Apple has been selected out. There is a variable here at the end now. There's a filter variable, and if I click on the value labels, I can see there are cases that are selected or not selected. And now I am going to go back, and I am going to do my box plot all over again.

All I have to do is press OK, but this time I don't have any outliers. In fact, this is a pretty normal-looking box plot. I can see that of the 2800 companies in the NASDAQ, the median level of market capitalization is around $40 million. The first quartile, the first lowest 25% have 20 million or less, whereas the highest quartile have about $60 million or less. There are of course hundreds of outliers above these, but these give a nice picture of what you'll call the small capitalization market.

Anyhow, the ability to either combine groups or to temporarily exclude outliers is one good way of dealing with them, as long as you can justify your choices. Again, that gets back to a general statistical principle that you can do whatever you feel is most appropriate and that serves your purposes in telling an analytical narrative. You're telling a story about your data, and if temporarily excluding cases or combining them with other groups serves your purposes best, then go ahead and do that, as long as you can justify your decision to others.

Now, in the next video I will look at another way that does not exclude the cases. It leaves them all in, but changes them by doing what's called a transformation, to let you use all of your data and see if you can still find a way of telling a coherent narrative that way.

There are currently no FAQs about SPSS Statistics Essential Training.

Share a link to this course
Please wait... Please wait...
Upgrade to get access to exercise files.

Exercise files video

How to use exercise files.

Learn by watching, listening, and doing, Exercise files are the same files the author uses in the course, so you can download them and follow along Premium memberships include access to all exercise files in the library.
Upgrade now


Exercise files

Exercise files video

How to use exercise files.

For additional information on downloading and using exercise files, watch our instructional video or read the instructions in the FAQ.

This course includes free exercise files, so you can practice while you watch the course. To access all the exercise files in our library, become a Premium Member.

Upgrade now

Are you sure you want to mark all the videos in this course as unwatched?

This will not affect your course history, your reports, or your certificates of completion for this course.


Mark all as unwatched Cancel

Congratulations

You have completed SPSS Statistics Essential Training.

Return to your organization's learning portal to continue training, or close this page.


OK
Become a member to add this course to a playlist

Join today and get unlimited access to the entire library of video courses—and create as many playlists as you like.

Get started

Already a member?

Become a member to like this course.

Join today and get unlimited access to the entire library of video courses.

Get started

Already a member?

Exercise files

Learn by watching, listening, and doing! Exercise files are the same files the author uses in the course, so you can download them and follow along. Exercise files are available with all Premium memberships. Learn more

Get started

Already a Premium member?

Exercise files video

How to use exercise files.

Ask a question

Thanks for contacting us.
You’ll hear from our Customer Service team within 24 hours.

Please enter the text shown below:

The classic layout automatically defaults to the latest Flash Player.

To choose a different player, hold the cursor over your name at the top right of any lynda.com page and choose Site preferencesfrom the dropdown menu.

Continue to classic layout Stay on new layout
Welcome to the redesigned course page.

We’ve moved some things around, and now you can



Exercise files

Access exercise files from a button right under the course name.

Mark videos as unwatched

Remove icons showing you already watched videos if you want to start over.

Control your viewing experience

Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.

Interactive transcripts

Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.

Thanks for signing up.

We’ll send you a confirmation email shortly.


Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

Keep up with news, tips, and latest courses with emails from lynda.com.

Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

   
submit Lightbox submit clicked