Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member
We've downloaded copies of the full year data for 2001 to 2010. This is 2001.txt. Line one is a series of headings and the data is stored in columns below. But if we scroll across to the right, we can see that the headings and the columns aren't aligned. So we'll need to inspect the data in here a little bit more closely before we can extract the data that we need. If you click inside line one between any of the headings and press the right arrow key you move one space at a time, but if you click between two columns of data and press the right arrow key, you jump right across.
So that means we're dealing with tab separated data. Lets create a script to start extracting the data so we need a new PHP file. And we call process_years.php. And I'll see that it's ended up in my RAW data folder, it needs to be in the top level folder, so I'll move it to problem one and then we can start work on the script. First thing we need is variable for the year, we're going to start with 2001 and then we need to open the file.
We use the Fopen function for that that returns a reference to the file. First target to F open is the file location it's in the raw data folder and the file name is the year followed by the .txt. And the second argument to Fopen is the mode, we want to read it. So the second argument is String with r inside it. We can get the first line using the Fgets function which gets one line at a time.
Takes the reference to the file, and then the next line we're dealing with tab separate values. So we can use the Fgets CFV function which deals with character separated values including tabs. Install the result as line. First argument is the reference to the file. The second argument is the maximum number of characters in the line, we'll say 500. The third argument is the delimiter or what is separating the values, that's tab.
So we need, double quotes backslash t. Then we can close the file. And Fgets CSV returns an array of values so we can inspect that array using the print r function. So what we're looking at is line. So if we save that and then view it in a browser. There is the array that is being returned. The important thing to notice is that the first element is both the date and the time as a single element. And then all the other elements are separate.
And we're going to need to separate the date from the time. So we need to count the number of characters there it's four plus two plus two. And then the underscore so we've got a total length of ten characters. So let's go back to the editing program. And if we look in 2001.txt they can time that's the first element zero, wind speed is one of the values we want that's at index one. Air temperature is also a value that you want.
That's at index four and if we scroll across, barometric pressure is at index six. So, we need to create variables for those indexes. So, we'll have wind, we'll just call it 'W' That was index one. Air was index four. And barometric pressure was index six. The code online eight gets only one line at the moment.
So we need to put that into a while loop. So while and then we need to have two parenthesis there. And we're going to put another pair of parenthesis at the end there. And then not equals false. So there is our condition for the while loop. So what will happen here is that we will get one line at a time. But when we get to the end of the data, Fget CSV will return false, and so the while loop will come to an end.
First thing we need to do inside the while loop is to get the date, which we can use using the substring function. So, date equals and then substring, and what we're looking at is the first element in line. So that's line zero, and then the start is right from the beginning, we count from zero, so it's zero. We counted ten characters in the date, so the length is ten.
And we can now use this to create a multi dimensional array. So lets initialize that array outside of the while loop. We'll call it data set and we will initialize it as an empty array. And then we can start building the multi dimensional array in the while loop. So our data set will have the date, then the sub-array will be air temp, Underscore there and then a pair of empty brackets at the end.
That will be the next element that will go in there. And we assign to that, the value of line with the value of air. The index of air is stored as A. And I've discovered that some figures have a leading space. So we'll pass each of the figures to ltrim to remove the leading spaces. So Ltrim, and then it is line and then its index is stored as A. So we need to copy that line twice, and then we'll change that from air temp to bar press and the index is B.
And then this one will be wind speed and that index will be W. So, we can now test this script by changing the value past the print r to data set. And if we save that and then go back to the browser and reload it. Take a little while and there is all the data as this massive multi-dimensional array. And if we right click and view source we can see the array more easily. There it is there is the date, air temperature, all the values for air temperature.
And if we scroll down, we will see all the different values. It's a very long array. So if we go back to the editing program, this very short script sorts out the filtering of a single year. But there are ten years to process. And before inserting the data into the database, we need to make sure that the format of each file is the same. And the bad news is that the format suddenly changes in 2007. So let's open 2007.txt. As before, the first line consists of text headers, but there are fewer of them, and they're in a different order.
Fortunately, the columns of data are still separated by tabs, and the date and time are still treated as a single column. So all we need to do is find out the index of air temperature, barometric pressure, and wind speed. The index for air temperature is one. Barometric pressure is two. And then wind speed is along here, it's the last one, it's index number seven. So we need to go to our processing script.
And we need to also the values of W, A, B, which are our indexes. So we use the conditional operator, or the turnery operator. So, year is less than 2007. If that is true, we will use one. If it's not true, we'll use seven. And it's the same way here. Just add that, year less than 2007.
We'll use four alternatively we'll use one. I've messed up my question mark there. Let's just quickly add those in. So this works rather like an if/else statement. If year is less than 2007 the value of W will be one. If that's not less than 2007 then it will change to seven. So to process all ten years we can change this first line here, line two, into a for loop. So, four and then we begin with year is 2001, year is less than or equal to 2010 and then increment the year.
And we don't need the printer at the bottom, so we'll get rid of that line. And replace it with our closing brace. Now let's just tidy that up. So this script will extract all the data that we want from the ten years of files. The next thing we need to do is to insert it into the database, and we'll tackle that next.
Get unlimited access to all courses for just $25/month.Become a member
61 Video lessons · 92812 Viewers
82 Video lessons · 104221 Viewers
71 Video lessons · 75910 Viewers
56 Video lessons · 107268 Viewers
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.
Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.
Your file was successfully uploaded.