In this video, Mark Niemann-Ross introduces a sample data set used throughout the course. Learn about importing and cleaning tab-delimited data.
- [Instructor] Throughout this course, I'll be using the Union of Concerned Scientists satellite database an example of high volume data. As high volume data, it's actually fairly small, but for the purposes of experimentation in this course, it's a good size. I've placed this at the root level of the exercise files folder, right next to an R file called sampledataset.R. I'm going the open sampledataset.R in R Studio and let's take a look at that.
When you use sampledataset.R, it's important that you've changed your working directory to the folder that contains the exercise files that you're using. So in this case, let's change to chapter one. I click once on chapter one. And then I go to the More button and change Set as Working Directory. Now my working directory is chapter one. To run the code, which will import the database, I select line six and I hit run.
And you'll notice that the global environment now has a data frame called UCS_Satellite database with 1886 observations. If the import is unsuccessful in any way, use the import dataset button at the top of the environment tab to import the data from the exercise files. You'll want to clean up some of these columns, and I've provided some rows that will clean this up for you. Lines 11, 12, 13, 14, through 21 all clean up this database.
And to run those you just click the run button. You can ignore the errors, and now UCS_Satellite database is clean and ready to use for your example files.
- Accessing memory and processing power
- Visualizing high-volume data
- Profiling and optimizing R code
- Compiling R functions
- Parallel processing with R
- Using R with other big data solutions