The first of the three Python programs shows how to import a dataset into a SQLite database. It demonstrates how to set up Python to use SQLite, then shows a simple SQL query being passed to SQLite to import a dataset.
- [Instructor] The first step in completing this database clinic challenge is to import the accidents dataset. This is a 19 megabyte file in CSV format. You'll find instructions for obtaining these data sets in the beginning videos of this course, or consult the handout in the exercise files. In particular you're going to want the 2015 road safety all 2015 data, as well as the lookup table for variables.
For the Python scripts to work correctly, you'll need to copy the data files into the same folder as the Python scripts. To do this, open your exercise files, open up chapter three, and you'll find several Python scripts as well as an R script. Open up the folder for the road safety data, copy all three files into the chapter three folder. Also, copy the Excel file for the lookup values.
When you've copied all of these files next to the Python scripts, you'll be ready to execute the scripts themselves. I'm going to use the idle Python development environment. It's extremely simple but it's already included with your Python installation. To find it, open up a terminal and then type in idle. If you're on a Windows machine, you'll have to launch idle from the start menu. Python idle environment will open up a window underneath the terminal window.
This is what it looks like. To open up a script, make sure the idle window is active, it'll say Python, and go to file, open, navigate to your exercise files, go to chapter 3, and open up one under bar, import accidents dot PY. Now you'll see the Python script that we're going to run. What I'm going to do is run the code, show you the results, and then we'll come back and examine how the code did what it accomplished.
So let's go ahead and run this. In Python idle I'll go up to the run command and I'll go run module. When the cursor returns, the program will be finished. We can go into the exercise files, chapter three, and you'll now see that there is a file called UK road data dot SQLite. Using DB browser for SQLite we can take a quick look at that file.
Click on open database, navigate to your exercise files, navigate to chapter three, and then open the UK road data dot SQLite file. You'll see that we have accidents 2015 as a data table, containing all of the information that was in the original CSV file. Let's take a look and find out how the Python script actually did that. This is the Python script for import accidents.
I'm going to walk through some of the commands. Many of these commands will be repeated in future scripts, and I will not talk about them in that time. To start there are several comments, the first one being that this is a Python 2.7 script. I've chosen Python 2.7 because it's more commonly available. The first important line is titled import SQLite three, and what that does is provides Python with a library that we can use to manipulate SQLite, it allows us to import into an SQLite file, create a file, manipulate things.
And then I use import CSV, which imports a comma separated value handling library. I've decided to write all of the data out into UK road data dot SQLite and so I've assigned that to the variable SQLite file. This line you'll see repeated many many times in all of these Python scripts. It opens up a connection to that SQLite file. You'll notice SQLite three connect and then it uses the SQLite file variable, which contains UK road data dot SQLite.
My cursor equals connection cursor, simply gives us a pointer into that file. Now here's the interesting part. We'll scroll down, and this is the script that we use to actually create the SQLite file. There are better ways to do this and we'll discuss this in future scripts, but for right now this is the simple way to understand what's going on. The first thing that I do is create a variable, and in that variable I'm going to assign an SQL command. SQL for create table.
We're going to create a table called accidents 2015. In that table we're going to create a number of columns. This is the list of the names of columns and you can see there's the end of the SQL and that's all that it does, is just create the database and all of the columns. My cursor dot execute then executes that command. It creates an empty table. To populate this table, we'll use CSV dot reader and we open the accidents CSV file in read only mode, with a delimiter of comma.
We set row ID count equal to zero so that we can remove the first line of the CSV file, which you'll remember contains the column headers. This for loop then does all of the work of importing data for us. For each line in reader, and reader remember is a pointer to the CSV reader which is going to contain all the CSV data that we're importing. We put each line into a variable called row, then if the row ID count is something other than zero, we'll say the row equals and we do some manipulation to strip out certain parts, certain white spaces and clean it up.
I'll use the SQLite command, my cursor dot execute, to execute an SQL phrase called insert into accidents. And this tells SQLite how to take the data from row and put it in to the SQLite data file that we've created. I increment the row ID counter by one for each loop in the for statement. Finally when I'm done, I'll issue the MySQLite command called connect dot commit, and that actually saves all of the changes to the SQLite file.
So again, let's take a look at that file and see how it actually came out. I've switched over to DB browser for SQLite to take a look at the contents of the file that we created. Notice that we have a table called accidents 2015, containing all of the columns of data. When we browse the data, we can see that all of the data, from the CSV file has been imported into this SQLite file. This completes the first task for this database challenge.
- Strengths and weaknesses of SQLite
- Creating a database
- Joining data sets
- Calculations with SQLite and Python
- Searching a database
- Subqueries and queries in SQLite
- CRUD operations in SQLite with R