Bringing data in from other sources requires that the data be transformed into the format expected by each database tool. In this video, instructors are challenged to create a database and populate it using a public dataset.
- Alright it's time for our first challenge. Lao-tzu said it best: the journey of a thousand miles begins with one step. The first step in our journey is creating a database. Databases store a combination of specifically formatted information and open-ended information. Different database tools store that information in unique ways. Flat files use a table, SQL databases use a relational model and NoSQL databases use a key value model.
Regardless of how they store that data many tools are optimized so that if the database knows that a certain piece of information will always be in a specific format. For example, a date, an amount of money or a geographical location it can treat that information more efficiently. An important part of creating a database is understanding the kinds of data that you need to store. Once your database is setup and knows what information to expect it's time to add that information in.
Databases are populated in several different ways. They can be populated by hand, for example consumer polls are conducted face to face or over the telephone. And the answers are entered into a database by the pollster. Or computers can collect data and automatically store that information in a database. For example, millions of weather stations feed their location, local temperatures, wind speeds and more into a central database. Another way is to import data that already exists.
Terabytes of research data are available online. This information can be downloaded and imported into a database for further discovery. In this chapter, I've asked the instructor to demonstrate the simplest of tasks, create a database and import a dataset. I've asked them to import the California Population Projection by county, age and ethnicity dataset for 2010 to 2060, available from the state of California's open data portal.
Our instructor's challenge is to take the dataset and convert it for use with their database tool of choice. This may seem like a simple task, but it demonstrates a valuable lesson. The dataset is almost 250 megabytes in size and the format of the data in the downloaded file may not match up with the way their particular database management system stores that data. Let's see how they accomplished this task.
- Creating a database in Excel
- Joining two data sets
- Searching a database in Excel
- Importing and formatting text in an Excel database
- Creating a PivotTable from two data sources
- Calculating averages and percentages