From the course: PHP Tips, Tricks, and Techniques

Export spreadsheet data to a multi-table database

From the course: PHP Tips, Tricks, and Techniques

Export spreadsheet data to a multi-table database

- [Narrator] Hi, I'm David Powers, and welcome to this week's edition of PHP Tips, Tricks, and Techniques designed to help you become a smarter, more productive PHP developer. This time I'm gonna show you how to export the data from a spreadsheet to a multi-table database. This is how it's done. Begin by exporting the spreadsheet data as comma separated values. Microsoft Excel and OpenOffice Calc do this automatically when you save a spreadsheet as a CSV file. Decide which columns store frequently repeated values that need to be moved to separate tables and structure the database tables accordingly. Then use separate SQL queries to populate each table with the data looping through the CSV file one line at a time. An efficient way to do this is with a generator as explained in the previous edition of PHP Tips, Tricks, and Techniques. As you do so, keep track of the primary keys of the repeated values that are stored in separate tables. You can do this with indexed arrays. When you create each new record in the child table insert the primary keys of the current repeated values as foreign keys. Now if that sounds rather abstract it should become clear later. This spreadsheet contains the records of nearly 10,000 students that I want to export to a database. There are two columns that contain frequently repeated values that need to be moved out to separate tables. The first one is this one, state, and then also program. This will make the database far more efficient. Now although grad year also contains lots of repeated values there's no advantage in storing them in a separate table because they're just numbers. Each student in the spreadsheet has an ID so we can use that as the primary key for each record in the students table. And to simplify the code I'm going to ignore these two columns phone and GPA. This is what the students table will look like. The states will be stored in a separate table using auto increment to generate each state's primary key which will be stored in the students table as a foreign key. Similarly, each program will be stored in a separate table and referred to in the students table as a foreign key. I'm assuming you're familiar with using PHP to interact with the database. If you're unsure, take a look at my course on accessing databases with PDO and MySQLi. Now let's get to the PHP code. To save time I've already created the script in this file, process underscore CSV dot PHP, which you can download in the exercise files for this video. The download files also contain this SQL file with the database structure and the spreadsheet data both in Excel and in CSV format. What's not included is the file that creates a PDO connection to the database. You'll need to provide that yourself. So let's get back to the PHP script. On line three, I'm including the database connection and on line five I've stored the location of the CSV file that contains the data I want to import. Lines eight through to 17 define a PHP generator that will allow us to access the data in the CSV file as an associative array, one line at a time. I described how this works in detail in the previous edition of PHP Tips, Tricks, and Techniques, so I won't go over it again. What's new this time is the code to populate the database so let's take a look at that. First we initialize three PDO prepared statements. The first one inserts a value into the states table using a PDO named parameter. The second one inserts a value into the program's table again using a name parameter and then the third statement populates the students table. There are several advantages of using PDO prepared statements with named parameters. First you don't have to worry about wrapping values in quotes and it's also more efficient because MySQL or MariaDB needs to validate the SQL only once rather than each time the query is executed and that's a big saving considering the spreadsheet has 9,999 records. Also, named parameters make the code much easier to read and debug. Next I've initialized two arrays, states and programs, these will keep track of the values inserted in the states and programs tables. There are also two counters initialized at zero, these will keep track of the primary keys of each value inserted into those tables. Then the for each loop that begins on line 32 processes each line of the CSV file using the PHP generator defined at the top of the script. Each time the loop runs the generator yields the next line of the CSV file as an associative array using the header of each column as the array key. The conditional statement on line 34 checks the state's array to see if it contains the current value of the state column in the CSV file. If it doesn't, the code on lines 35 to 38 is executed. The first time the loop runs the state's array is empty so the current value of the state column is assigned to the state's array using the ST counter with the pre-increment operator. So the first time it runs, it's one, in other words, we're creating an indexed array that starts at one rather than at zero. Then we execute the first prepared statement to insert the current value of state into the state's table. And the value of the counter is assigned to a variable called state ID. Let's just pause for a moment to reflect what's going on here. The current value of state is assigned to the states array at the index of the counter incremented by one. The first value is California and the array index is one. California is inserted into the state's table and its primary key is also incremented to one. By assigning the array index to states ID we're in effect storing the primary key of the current state. The next value inserted is Texas both the array index and the primary key are now two. But if we encounter Texas or California again we don't want to insert them into the database again all we need is the array index of that state which we insert into the students table as a foreign key. So if the state is already registered the else block on lines 39 to 42 uses the built in array search function to get the index of the current value of state from the states array and assigns it to state ID. There's another if else block on lines 44 to 52 this follows exactly the same process with the programs inserting values into the programs table and assigning the current primary key to program ID. And finally we insert the values into the students table. Most of the values come from the current line of the CSV file with the values inserted into the state and program columns are state ID and program ID. In other words, the primary keys extracted by the two previous conditional statements now being stored as foreign keys. So it's now the moment for truth let's run this script by loading the page into a browser. There a lot of records so it might take a few seconds but it says it's now done, and I think that's pretty fast for 9,999 records, so let's take a look into phpMyAdmin. We've got three tables, the first one is programs, there are just five values in there. States it tells us up top here we've got 51 values in there, the extra state is Washington, D.C., which is being counted as a separate state and then in students, we've got 9,999 records. And if I just collapse that we can see that we've got the state ID and the program ID in there as the foreign keys so let's see that at work with a little bit of SQL. I'm gonna delete that and put in a new query, I'm gonna select student ID, last name, first name, email, grad year, state, and program from students joining with the foreign keys on states and programs, so let's just run that, and there are all the values. To recap, this time we looked at inserting data from a spreadsheet into a multi-table database. It involves saving the spreadsheet as a CSV file then identifying the columns that store frequently repeated values and creating separate tables for them. Process the CSV file one line at a time in a loop using separate SQL queries to populate each table. Use arrays to keep track of the primary key of each repeated value that's inserted into the database. And insert the primary keys as foreign keys for each record in the table that contains the other non-repeating values. That's it for this edition of PHP Tips, Tricks, and Techniques. I hope you can join me next time.

Contents