From the course: SAS Programming for R Users, Part 2

The DO loop

- [Instructor] In this section, I want to show you how to simulate observations from random distributions like the Normal, Chi-Square, Gamma, and Weibull distribution, and save those observations in a new SAS data table. We want to be able to set a seed so we can duplicate our results. We want to create random data sets from our r functions in R, rnorm, rbinom, and so on. We also want to be able to add variables to an existing data frame. Maybe we also want to use the rep function to create a classification variable. And finally, we want to use the other probability functions like dnorm, pnorm, and qnorm. First we need to use the DO Loop. We can create a sequence, maybe one to 10, two to 20 by two, or we can go in the reverse order. Maybe we want to create repetitive values. For example, maybe I want to add a column of ones to my data table which will represent perhaps an intercept if I'm simulating a linear regression model. We want to be able to create groups. For example, create a classification variable if I'm simulating a nova data. And in particular, we're going to focus on generating random numbers and creating a new SAS data set. And of course, this will all be done inside of a DATA Step. So here I say that the DO loop is equivalent to the seq function in R. So for example, I start with my DO statement, specify an index-variable, in this case i. I'll set it equal to a starting value of one. I use the keyword TO and give it a stopping value, in this case five. So I'm going from i=1 to 5, so it acts as a sequence. And always end your DO loop with the END statement. In a few other examples, I add in the BY increment option. So I'm going from i=2 to 10 by 2 or perhaps in the reverse direction, i=10 to 2 by -2. The DO loop is equivalent to the seq function in R and you can also think of it as a for loop. To create a new SAS data set, of course, we'll use the data step. And here I'm going to use the DO loop to help me create my data set. I'm going from i=2 to 10 and giving an increment of two. And in order to actually output all iteration values to my data set loop, I need to use the output statement, otherwise, SAS would actually only write the last value of the loop. There would only be one observation if I forgot my output statement. So you need to be explicit and tell SAS to write all iteration values to your data set. And again, remember to the do loop with your end statement. So inside the loop I've created a new variable x, and I'm setting that equal to my index variable i plus just the value one. And I'm also creating a new variable rep which is just going to equal one in every instance of the iteration. So my output here, I have my index variable i, which is two to 10, x, which is three to 11, and rep which is just one which would most likely represent an intercept in a linear model. Now if you don't want to keep your index variable in your data set, you have two options. You can specify the keep option in the data statement. Here I'm saying keep only the variables x and rep. And likewise, you can just tell SAS to drop that index variable i. I would have a data set with only two variables x and rep. A nested DO loop is similar to the rep function. It allows us to repeat values. It's also similar to a nested FOR loop. So here I have a nested DO loop. I'm going from i=1 to 2, and immediately following it have another do loop j=1 to 2. And of course, remember your output statement to write all your values to the data table. And you'll notice, in iteration i we start with a value of one and iterate through j, one and two. And then moving to a value of two for i, we iterate again through j, one and two. So exactly the same as the for loop in R. There's an alternative way to accomplish the nested DO loop. You can just use multiple doloops in sequential data steps. So my first data step here I'm creating the data set doloop. And here I'm going from i=1 to 2, and I'm writing both values to the data table. Now if I go ahead and then apply another doloop to an existing SAS data set in this case the doloop data set it's going to iterate through all observations in that data set. It's going to iterate through values of one and two for the index variable i. So here we'd get the same data set as before if I used the nested DO loop. Why is this important? Well, perhaps you want to go ahead and add a sequence to an existing data set. Well, you actually don't want to use another doloop on that existing data set. For example, if you have a data set with 1000 observations and you want to create a sequence from one to 1000 and add it into that data set, you do not want to use a doloop. Why, it'll simply create a data set with 1000 by 1000 observations or simply a million observations. So how can we add in a sequence to an existing SAS data set? And this will be important when we go ahead and plot data so we can give the plots an x-axis value. So to add in a sequence, we'll use what's called a SUM statement. And a SUM statement automatically initializes to zero and its value is retained from one iteration of the data step to the next. So here I'm calling my SUM statement seq, and that'll be the variable-name in my data set, and I'm giving it the sequence-value of one, so seq + 1. So when we start the data step, it initializes to zero. And on the first iteration, the value is going to be zero plus one. I use the output statement so the value is written to my data table. And on the next iteration, my seq value is two, three, and so on. So use a SUM statement to add a sequence to an existing SAS data set.

Contents