Join Jordan Bakerman for an in-depth discussion in this video Creating datasets manually in the DATA step, part of SAS Programming for R Users, Part 1.
- [Narrator] In this section we want to go ahead and create a dataset by hand. So for example, we want to create a dataset with four variables: first name, last name, age, and height, which have character and numeric values. We're going to use a data step to create a new SAS dataset. And going forward in this course, I generally have a duplicate the R script step, so what do you do in R, and how do we do that in SAS? That'll be the coming section. For example, maybe I'll create four vectors first name, last name, age, and height.
Then I'll go ahead and combine them to create a dataframe. So basically I want to go ahead and create the five by four dataframe as a SAS dataset. So recall, data steps are used to read in data or alter existing datasets. We start with the data statement and specify a new dataset name. And then we use the input statement to specify the variables to be in the data set. And if the variables are character valued, we need to specify a dollar sign after the variable name.
And then we specify a datalines statement. It is a statement so we use a semicolon. And then we start writing our data in column major. Column one is variable A, column two is variable B, and so on. And after we enter all the data, we're going to use a semicolon and then we're going to use a run statement to finish up our data step. Now by default, SAS only gives you 8 bytes in a single variable. Numeric values are stored in floating point notation, storing up to 17 significant digits in eight bytes.
In a character variable, each character takes one byte, so by default they can hold a maximum of eight characters. If your data values are longer than eight characters, for example names, or shorter than eight characters, for example gender or state code, then you can use a length statement to specify length for the variable. So in the length statement, I'll say variable A, then a dollar sign since it's a character variable, and then I specify a number. How many characters do you want to be able to hold in that single variable? In general you just need an upper bound, so for example you don't have to go into the dataset and identify the largest variable, maybe you just want to go up to a hundred characters, but just keep in mind, it's going to actually save all that space.
So don't specify a number of bites that's extremely large because of course you don't want to save unnecessary space.
Note: You can visit the SAS site to obtain a copy of the software, and use the company's online data sets to do the course exercises.
- SAS University Edition
- Working in SAS Studio
- Using tasks and snippets in SAS Studio
- Determining power using simulation
- SAS procedure syntax
- Creating datasets manually in the DATA step
- Importing raw data files
- Creating new variables and conditional processing
- Match-merging data sets