Join Mark Niemann-Ross for an in-depth discussion in this video Introducing Lake Pend Oreille, part of Code Clinic: R.
(computer beeping and booting up) - Hello, and welcome to Code Clinic. My name is Mark Niemann-Ross, and I'm the content manager for the developer segment at Lynda.com. Code Clinic is a course where I introduce a problem to a collection of Lynda.com authors. In turn, those authors use their computer programming language of choice to produce a unique solution. You can learn several things from Code Clinic: different approaches to solving a problem, the pros and cons of different languages, and some tips and tricks to incorporate into your own coding practices.
This time, I'm introducing a problem in statistical analysis, and to some extent, handling big data. It's common to use a computer to manipulate and summarize large amounts of information, providing important insights on how to improve or handle a situation. In this problem, we'll use weather data collected by the U.S. Navy, from Lake Ponderay in northern Idaho. Lake Ponderay is the fifth, deepest, freshwater lake in the United States. So deep, in fact, that the United States Navy uses it to test submarines.
As part of that testing, the U.S. Navy compiles an exhaustive list of weather statistics. Wind speed, air temperature, barometric pressure; you can browse this data by pointing your web browser at Http://lpo.dt.navy.mil. You'll find several weather summaries, a web cam, and the raw data they collect, archived to standard text files. In this challenge, each author uses their favorite language to calculate the mean and median of the wind speed, air temperature, and barometric pressure recorded at the Deep Moor station for a given range of dates.
Let's briefly review statistics. Let's start with a set of numbers. How about, 14 readings for wind gust at Deep Moor weather station? You can see this data at lpo.navy.mil. The first column is the day the wind gust was recorded. The second column is the time it was recorded, and the third column is the wind gust in miles-per-hour. The "mean" is also known as the average. To calculate the mean of a range of numbers, simply add the values in the set, then divide by the number of values.
In this example, we add 14, plus 14, plus 11, plus 11, plus 11, plus 11, plus 11, plus three, plus seven, plus seven, plus seven, plus seven plus four, plus eight, and then divide the sum by 14, the count of numbers in the set. In this case, the mean is equal to nine. The "median" is the number halfway between the values of all the samples. Think of the median as, "In the median strip of the road". It always marks the center of the road. To calculate the median, first sort the numbers from lowest to highest.
If there is an odd number of values, then just take the middle number. If there's an even number of values, then calculate the mean of the central two numbers. In this case, there is an even number of values, so we sort, then take the average of the middle two values, eight and 11: the median is 9.5. So, there's our first challenge: poll statistics from a data-set available online. Perhaps you wanna pause, and create a solution of your own.
How would you solve the problem? In the next video, I'll show you how I answered the challenge.
Mark introduces challenges and then provides an overview of his solutions in R. Challenges include topics such as statistical analysis, searching directories for images, and accessing peripheral devices.
Skill Level Intermediate
Q: R Studio tells me that it can't find files I expect to be available. Where can I find them?
A: Use the setwd() command to set the working directory to match the folder you're working in.
Q: I am unable to access the Lake Pend Oreille data from outside the U.S.
A: A static copy of this data is provided here for lynda.com members outside of the U.S.