The tidyverse provides a consistent workflow for working with data. Learning the principles of the tidyverse make it easy to use htmlwidgets, which allow you to build interactive visualizations. The tidyverse provides a fast, efficient and well-documented workflow for common data wrangling, modeling, and visualization tasks.
- [Instructor] So why should we use the tidyverse ecosystem packages, instead of working exclusively in base R. Well, let's think about why people choose to use packages in the first place. Well, it's a fact of programming and scripting but building everything yourself from scratch is time consuming and more than likely, hugely error prone, this is why R users depend on packages. Using packages makes it easier to start working on a new project in R.
Packages can make collaborating with others on R projects easier, as you can be ensured everyone is using the same code base. But, there's a huge sanity warning that comes with using packages. The single guarantee that CRAN gives you is where the package will install on your machine. It fundamentally does not guarantee the package generates correct results. The package will be update in the future or that the package will work well with other packages.
As an example, let's consider an imaginary package called, sports ball. The package is available on CRAN and advertises itself as a tool for predicting the results of a season of sports ball events. Unbeknown to you, the package suffers the following problems, the package developer is biased to teams sponsored by their favorite sports drink, giving them unfair odds, meaning the output of the package is untrustworthy. Team sponsorships are hard-coded into the package and the developer is already quite bored of updating the package, so, over time the package will not only be generating biased results but inconsistently, biased results.
The developer has also decided to output the results in a nonstandard format. Meaning they can't easily be consumed by base R or any other package for that matter. This is definitely a slightly ridiculous example of how bad things can get, but hopefully, guides you to restrain yourself from relying on packages without looking into them a little bit first. So what makes the tidyverse different? Well, the core of the tidyverse is developed by developers at Rstudio.
Rstudio use the idea we will be using to write our code in this course. It's a company with an extremely good reputation including, for R package development. Rstudio's own internal tools dependent on components of the tidyverse, helping to reassure us of the long term viability of the tidyverse ecosystem and tidyverse is developed openly on GitHub, meaning users can track continuing development and if necessary, fork packages in the future if Rstudio themselves, stop updating them.
Now industry users of R, often depend on so called, Validated R installations. These are collections of R packages which have been validated to produce expected results. This is particularly important in pharmaceutical or banking applications of R. You might not code a work conception industry, but it's important to know that future employers or contractors might not always allow you to willy nilly, depend on the hundreds of packages from crowd.
Mango solutions are a very popular provider of a validated R installation and they include the tidyverse within their valid R installation.
This course introduces the core concepts of the tidyverse as compared to the traditional base R. It focuses on the novice user and those unfamiliar with the pipe (%>%) operator. After covering these R basics, instructor Martin Hadley progresses to importing and filtering data from Excel, CSV, and SPSS files, and summarizing and tabulating data in the tidyverse. Then learn how to identify if data is too wide or long and convert it if necessary, and conduct nonstandard evaluation. By the end of the course, you should be able to integrate the tidyverse into your R workflow and leverage a variety of new tools for importing, filtering, visualizing, and modeling research and statistical data.
- Understanding the pipe (%>%) operator
- Importing .xlsx and .csv files
- Filtering and summarizing data sets
- Using tidyr to convert wide and long data sets
- Non-standard evaluation and programming with the tidyverse