From the course: Learning the R Tidyverse

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Sample data and cross-validation with dplyr

Sample data and cross-validation with dplyr - R Tutorial

From the course: Learning the R Tidyverse

Start my 1-month free trial

Sample data and cross-validation with dplyr

- [Instructor] Dplyr is a member of the tidyverse ecosystem designed for manipulating data with a variety of different verbs; some of those verbs are dedicated to the sampling or re-sampling of data sets. Sampling data sets is important, where operating on the entire data set would be expensive, either computationally, or in terms of time, and sampling is also very important where you need to do cross-validation to ensure that your model has predictive power over observations inside of your data set. Now, the sample verb is the verb inside the dplyr which allows us to do robust sampling of data; let's look at some examples of how to do that in our studio. So, inside of our project, we have a file called data-processing.R, let's open that up and see what we have here; at the top of the script file, we have library.tidyverse, which loads us to the tidyverse, so let's run this code with command Enter, I'm going to decrease the size of file explorer, because I don't need to see it right…

Contents