Join Martin John Hadley for an in-depth discussion in this video Strengths of the tidyverse, part of Learning the R Tidyverse.
- [Instructor] Okay, but what does the tidyverse provide us as end users or data scientists if we're being quite bold? Well, using tidyverse leads to advantages in the following main areas. Data importation, Data wrangling, and data visualization. Let's look at each of those in turn. Readr completely blows away the baseR tools for importing rectangular data files like csv and tsv files. It's not only significantly faster than baseR but it's more intelligent.
For instance, automatically converting dates to dates, times to times, and converting columns that should be numbers into numbers. And finally, it never ever imports columns of strings as factors. If you're already a baseR user, chances are you've spent hours of frustration because of this issue using baseR. This course includes a section on importing data with readr as it's for general workhorse of most R users data import toolkit.
The readxl library makes importing from excel files ridiculously simple allowing worksheets, individual cells, or even cell ranges to be targeted for import easily. It also has the best package logo with Clippy from many moons ago inside of Microsoft Office. The tidyverse also aims to fit into existing workflows. So if that includes working with SAS, SPSS, or Stata, haven, part of the tidyverse is the package for you.
Now these three packages significantly decrease the time needed to massage data files into R. And help solve a number of common frustrations with baseR packages. But do note that readR is the only package of the three that is part of the core tidyverse. You need to separately load readxl and haven to access those libraries. The tidyverse utilizes the pipe operator. This percentage greater than percentage thing to provide a logical framework for chaining together common data wrangling tasks.
This makes code very faster to write and easier to read. There's a chapter in this course dedicated to the pipe operator as it is mysterious magic to many R users. The tidyverse is designed around a concept of tidydata. The tidyr library is designed for reshaping and transforming your imported data into a structure ready to manipulate, model, and visualize with the tidyverse. Dplyr is the library for subsetting, filtering, summarizing, and generally wrangling your data.
It also provides a number of tools for doing database-like operations for working on relational datasets. These packages used together form a core data processing component of the tidyverse. The operations you'll perform with these are both significantly faster and simpler to construct than simply using baseR. Ggplot2 provides a complete consistent and incredibly powerful grammar of graphics, allowing impressive static visualizations to be built with minimal effort.
There's an exercise folder included for this video with the code to make two sample ggplot2 graphics. I'm going to quickly what's possible using all the components for tidyverse together. So inside of my exercise files, I have a file called example_ggplot2. At the top on line one and two, I load a libraries that I need gapminder and tidyverse. And then line four through ten, I have a little bit of code which uses dplyr on line five and six.
To group my data by continent and year, and then summarise to calculate the continent population. And then line seven through ten, I use ggplot2 to generate my chart. So if select over the code and run it, then over here on the viewer on the right, I have quite a beautiful looking static visualization generated with ggplot2. This course if very much focused on using the tidyverse for data wrangling. You'll need to consult other videos in our catalog the details of ggplot2.
But it's important to know that ggplot2 provides a powerful consistent grammar for creating static shots. If you're interested in building interactive charts for the web with R, you'll need to learn about html widgets. Visit dedicated course in our library available for this. But in most commonly used html widget libraries, lean heavily on the tidyverse for preparing data for visualization. Plus all the good html widgets are designed to use the pipe operator.
So this is a great course for you to get primed ready for creating interactive charts with R. Let's have a look at what we could build using html widgets in R. So when I script of lines one through four, we load the necessary libraries, the gapminder, tidyverse, and highcharter library. Line five through seven, we use the dplyr library to wrangler our data. And then line eight through 14, we build our chart in the highcharter library.
Let's select all this code from 14 to line one and run it and have a look at what we'd get. So instead of a static chart, what we have now is an interactive chart that I can move my cursor through and get information for specific points in the dataset. I can also actually remove series if I'm interested in doing that. So, ggplot2 is part of the tidyverse and allows us to build static charts in a consistent manner.
So by using the tidyverse, you are preparing yourself in the end to be able to build very powerful tools for others to explore and understand your data.
This course introduces the core concepts of the tidyverse as compared to the traditional base R. It focuses on the novice user and those unfamiliar with the pipe (%>%) operator. After covering these R basics, instructor Martin Hadley progresses to importing and filtering data from Excel, CSV, and SPSS files, and summarizing and tabulating data in the tidyverse. Then learn how to identify if data is too wide or long and convert it if necessary, and conduct nonstandard evaluation. By the end of the course, you should be able to integrate the tidyverse into your R workflow and leverage a variety of new tools for importing, filtering, visualizing, and modeling research and statistical data.
- Understanding the pipe (%>%) operator
- Importing .xlsx and .csv files
- Filtering and summarizing data sets
- Using tidyr to convert wide and long data sets
- Non-standard evaluation and programming with the tidyverse