Projects provide a useful tool for splitting your work into distinct projects. Projects significantly simplify working with file paths in R.
- [Instructor] Projects are a powerful tool in RStudio for developing reproducible code, whether for individual data analyses, data-driven reports, or even developing your own packages. Projects also get you ready from the beginning of a collaborate with you, for instance, on GitHub. But the main advantage of projects is that they make your life easier when importing data. If you've used R before, you'll be familiar with the concept of a working directory. If not, then a working directory is simply the place where R is currently looking for files.
If you're not using projects, then you'll likely have seen this at the top of your script files. Setwd, open parenthesis, quotation, long-path/to/data-folder, close quotation, close parenthesis. And your data files live in that folder. But then if you send your code to others, they'll need to change their path to include their own computer's name and whatever crazy long path they have, as well. Also, you'll probably actually forget to send them the data file in the first place with a script file.
Projects completely negate the need to do this, because projects make the whole concept of working with files and their paths easier. You'll be actively discouraged from the terrible practice of using absolute file paths. Reproducibility is a hot topic in research. How can we reassure others that results of our analyses or conclusions about research data are accurate, without providing the necessary code and explanations of our methodologies? R is an excellent toolkit for ensuring reproducible research, thanks to its open source underbelly.
Anybody can go and look at the source code for base R, as well as the packages from Chrome that you rely on. If you're working in a highly regulated industry like pharmaceuticals or banking, you could even go and get yourself a validated version of R. Organizing your own research into projects minimizes the work others need to do to reproduce your results. Here, we see an exceedingly clean R project. The folder is called controversial-results, and we can see it contains both the raw telemetry behind our research in a folder called data-raw, and a tidied form of the data in a folder called data.
Then the process for data wrangling is kept within the data-processing.R file, and finally, there's an R markdown file which is used to generate The Controversial Report.pdf which we communicate to others. Anyone can reproduce this analysis by simply obtaining the controversial-results folder, opening up controversial-results.RProj file in RStudio, and then running the data-processing.R file.
Finally, projects make collaboration much easier, because projects keep all your files together. They're perfectly designed for version control systems, like Git. All of the tidyverse R libraries are developed in GitHub and available as projects. So here's a repository for Haven with a great readme telling me how the package should be used and what it's for, but if I want to grab the whole package, all I need to do is go to the green button and select Download ZIP, navigate to my downloads folder, and then inside of the haven-master folder, you'll find a .Rproj file.
And if I open that up, it opens up Rstudio, and I'm ready to begin my own modifications of the Haven package, if I wanted to. Projects negate the need for setting working directories, as everything becomes a relative file path to the .Rproj-containing folder. Projects improve both reproducibility and collaboration. If you want others to work with you, the best thing you can do is set up an RStudio project and host it on a version control system like GitHub.
This course introduces the core concepts of the tidyverse as compared to the traditional base R. It focuses on the novice user and those unfamiliar with the pipe (%>%) operator. After covering these R basics, instructor Martin Hadley progresses to importing and filtering data from Excel, CSV, and SPSS files, and summarizing and tabulating data in the tidyverse. Then learn how to identify if data is too wide or long and convert it if necessary, and conduct nonstandard evaluation. By the end of the course, you should be able to integrate the tidyverse into your R workflow and leverage a variety of new tools for importing, filtering, visualizing, and modeling research and statistical data.
- Understanding the pipe (%>%) operator
- Importing .xlsx and .csv files
- Filtering and summarizing data sets
- Using tidyr to convert wide and long data sets
- Non-standard evaluation and programming with the tidyverse