In this video, Mark Niemann-Ross compares R packages designed to work with Excel data. Learn the strengths and weaknesses of each R package for Excel.
- [Instructor] You may have opinions about Excel, but regardless of how you feel about it as a tool, a huge amount of data is stored as Excel files. R provides packages to import and export Excel spreadsheets, maybe too many. There are at least eight packages that work with Excel files. Each of them have different advantages, and some of them rely on external libraries. Some packages are designed for other tasks, but include Excel import and export capabilities for convenience.
There are several important packages we should talk about, either because you'll hear about them from other sources, or because they provide unique tools. These packages are named in similar ways and might be confusing to keep apart. I've included a chart to summarize these differences, and you can find it in the exercise files. When you select a package, you'll want to consider a few things, external dependencies, simplicity, and special requirements. To start with, let's discuss external dependencies.
These are packages that use libraries from other programming tools to support Excel import and export. These dependencies may not matter if you've already installed the language they depend on. For example, Java is already installed on just about every computer in use today, but you may find these dependencies annoying if you need to update the libraries of if the version of the library is incompatible with the package you're hoping to use. Because there is such a wide range of packages available, I recommend you stay away from packages with dependencies unless you have a specific reason for using them.
The second consideration is simplicity, how easy is the package to use? Most packages include at least one command to import or export Excel files without any fuss, but in particular rio, a package, has been designed to simplify the syntax while hiding the necessary code. For example, with the rio package, import is a generic function that looks at the file extension to determine the necessary import library.
Granted, rio depends on several other packages to provide this functionality, but if your goal is simple import or export, you're not going to care what magic is used. In addition, rio has been designed to select the fastest import library available for a particular file format, and it imports more than just Excel files. The downside is that rio imports the entire file, including some data you may not want.
This can result in some tricky cleanup work on the resulting data frame. The third consideration is any special requirements you may have for the data. If you just need the third line of the second page of an Excel workbook, you're best off just importing that data. This can be a big deal if you're faced with a directory of a thousand Excel spreadsheets. In later videos, I'll show how to select individual columns, rows, and cells of a spreadsheet.
In summary, there are multiple choices for Excel import and export. Know which ones are available, and choose the best for your needs.
- Name the three types of big data.
- List three considerations used to determine the appropriate R package for Excel.
- Determine the best package used to import entire Excel workbooks.
- Explain how to import standard text files using base R and tidyverse.
- Define the purpose of the foreign language package for R.
- Recognize restrictions when working on SAS files in the foreign language package.
- Identify the problems involved with extracting data from a PDF in R.