From the course: Learning the R Tidyverse
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Sample data and cross-validation with dplyr - R Tutorial
From the course: Learning the R Tidyverse
Sample data and cross-validation with dplyr
- [Instructor] Dplyr is a member of the tidyverse ecosystem designed for manipulating data with a variety of different verbs; some of those verbs are dedicated to the sampling or re-sampling of data sets. Sampling data sets is important, where operating on the entire data set would be expensive, either computationally, or in terms of time, and sampling is also very important where you need to do cross-validation to ensure that your model has predictive power over observations inside of your data set. Now, the sample verb is the verb inside the dplyr which allows us to do robust sampling of data; let's look at some examples of how to do that in our studio. So, inside of our project, we have a file called data-processing.R, let's open that up and see what we have here; at the top of the script file, we have library.tidyverse, which loads us to the tidyverse, so let's run this code with command Enter, I'm going to decrease the size of file explorer, because I don't need to see it right…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
-
(Locked)
Sample data and cross-validation with dplyr4m 8s
-
(Locked)
Categorizing data with group_by7m 7s
-
(Locked)
Count members of subgroups within groups with n()3m 51s
-
(Locked)
Cumulative sums and more: cumsum, cumall, and cumany11m 52s
-
(Locked)
Create group summaries7m 26s
-
(Locked)
Remember to ungroup before moving on7m 44s
-
(Locked)
-
-
-