From the course: SAS® 9.4 Cert Prep: Part 03 Exploring and Validating Data

Unlock this course with a free trial

Join today to access over 22,600 courses taught by industry experts.

Identifying and removing duplicates

Identifying and removing duplicates

- [Instructor] By adding options to the PROC SORT statement, you can identify and remove duplicates in your data. The NODUPRECS option removes adjacent rows that are entirely duplicated. Another words it removes rows that are next to each other in the data where the values for every column match. When you're using this option, it's good to use the keyword _ALL_ in the BY statement instead of a column name. This sorts the data by all columns so that entirely duplicated rows are next to each other, and then the NODUPRECS option can do it's job. The table listed in the OUT= option has the duplicates removed. It's also helpful for validation to specify the DUPOUT= option and generate a table of the duplicates rows that were removed by NODUPRECS. Here's an example of using the NODUPRECS option in PROC SORT. You can see that the input table, class_test3, has two rows for Barbara that contain identical information. The output…

Contents