Join Barron Stone for an in-depth discussion in this video Solution overview, part of Code Clinic: Python (2014).
In this video, I'll give you a high-level overview of my solution to the Code Clinic challenge of retrieving and analyzing weather statistics from Lake Pend Oreille. I'll describe the approach I took to complete the challenge. And then, in the following videos, I'll walk you step by step through the code so you can see exactly how I did it. The solution I've created is written only using packages and modules that are included with the standard Python distribution. In these videos, I'll show you how I used the urllib package to access the Navy's website to retrieve historical data for Lake Pend Oreille.
I use the sqlite3 module to create a database to store a local copy of the retrieved data. I used the new statistics module, which was introduced as the standard module in Python 3.4 to calculate the mean and median values for a range of dates. And finally, I use the tkinter package to build a customized graphical user interface for the application. From the user's perspective, it's a fairly simple program that prompts them to enter a start and end date, and then displays the mean and median values across that range of dates for the wind speed, air temperature and barometric pressure.
While this may look simple at first, there's actually quite a bit going on in the background to provide the user with the result as fast as possible. The main bottleneck for this problem is the time it takes to retrieve the weather data from the Internet. The data is stored in text files on the Navy's website, with individual files for every day, for every type of gathered weather statistic. This means if I need to get wind speed, air temperature and barometric pressure for a single day, I have to download three separate files.
To get the data for two days is six files, three days is nine, and so on. It quickly adds up to a lot of time-consuming web access calls. There are a couple of different approaches that I could have taken to retrieve the weather data for a user’s request. The simplest would be to access the Navy’s website to download all of the necessary data every time the user submits a request. I call this the on-demand solution. As I'm sure you can imagine, this would take a very long time to complete if the user requested even a moderately large range of dates.
On the far other side of the spectrum, I could have created a local database to cache all of the existing weather data for Lake Pond Oreille on the user's machine. The database could be configured to update itself at regular intervals, so that it's always current, and when the user submits a request, the program can just access the fast local cache of data, rather than having to take the time to download it from the internet. One downside of this approach is that the database will require a rather long initial setup period because it will need to download and store all of the historical weather data before the user could even submit their first request.
I did some testing with this method and it took a little over 35 to download and fully populate the local database, which ended up being just under 50 megabytes in size. That was longer than I wanted to wait to use the program, so I decided against this approach. For my solution to the problem, I decided to take an on-demand with caching approach that lies somewhere in the middle. My solution code is divided up into three Python modules. The first module, called lpoApp, is the main module for the program. It creates the graphical user interface for the user to input their start and end dates.
When the user submits a request, it passes those dates to the second module, called lpoDB, to retrieve the weather data for that date range. As the name suggests, the lpoDB module creates and manages a local database to store all of the weather data that's been downloaded. It contains the logic to determine if a specific date has been previously requested and therefore cached in the database or if it's the first time for that date to be requested. First time requests will need to be downloaded from the Internet, in which case, it uses the third module, called IpoWeb, to access the Navy's website to download the data.
That data is returned to the IpoDB module, where it's cached locally in the database, and then all of the data for the requested date range is sent up to the lpoApp module. There, the mean and median values are calculated for the different weather measurements and the results are displayed on the GUI. This on-demand with caching approach only downloads data as needed to fulfill user requests and it caches the data locally to be used quickly to field future requests. In the following videos, I'll walk you through each of these modules individually so you can see how I built the GUI, managed the database, and accessed the Web, as well as learn a few Python tips and tricks along the way.
Barron introduce challenges and provides an overview of his solutions in Python. Challenges include topics such as statistical analysis, searching directories for images, and accessing peripheral devices.
Visit other courses in the series to see how to solve the exact same challenges in languages like C#, C++, Java, PHP, and Ruby.
Skill Level Intermediate
Q: Why can't I access the Lake Pend Orielle site (http://lpo.dt.navy.mil)?
A: The Lake Pend Orielle site is not accessible in some geographical areas. We have contacted the owner of the server to try to resolve this issue.
Q: I am unable to access the Lake Pend Oreille data from outside the U.S.
A: A static copy of this data is provided here for lynda.com members outside of the U.S