Bringing data in from other sources requires that the data be transformed into the format expected by each database tool. In this video, instructors are challenged to create a database and populate it using a public dataset.
- Alright, it's time for our first challenge. Lao Tzu said it best, the journey of a thousand miles begins with one step. The first step in our journey is creating a database. Databases store a combination of specifically formatted information and open-ended information. Different database tools store that information in unique ways. Flat files use a table. SQL databases use a relational model. And NoSQL databases use a key value model.
Regardless of how they store that data, many tools are optimized so that if the database knows that a certain piece of information will always be in a specific format. For example, a date, an amount of money or a geographical location. It can treat that information more efficiently. An important part of creating a database is understanding the kinds of data that you need to store. Once your database is set up and knows what information to expect, it's time to add that information in.
Databases are populated in several different ways. They can be populated by hand. For example, consumer polls are conducted face-to-face or over the telephone and the answers are entered into a database by the pollster. Or computers can collect data and automatically store that information in a database. For example, millions of weather stations feed their location, local temperatures, wind speeds and more into a central database. Another way is to import data that already exists.
Terabytes of research data are available online. This information can be downloaded and imported into a database for further discovery. In this chapter, I've asked the instructor to demonstrate the simplest of tasks, create a database and import a data set. I've asked them to import the California Population Projection by county, age and ethnicity data set for 2010 to 2060. Available from the state of California's open data portal.
Our instructor's challenge is to take the data set and convert it for use with their database tool of choice. This may seem like a simple task, but it demonstrates a valuable lesson. The data set is almost 250-megabytes in size and the format of the data and the downloaded file may not match up with the way their particular database management system stores that data. Let's see how they accomplish this task.
- Strengths and weaknesses of SQLite
- Creating a database
- Joining data sets
- Calculations with SQLite and Python
- Searching a database
- Subqueries and queries in SQLite
- CRUD operations in SQLite with R