You have a collection of data. You would like to quickly identify the median of the data set.
- [Instructor] The median of a data set is the true middle value of the values sorted. Now, if there isn't a single middle value, such as if there's an even number of elements in the list, we take the average of the two values closest to the sorted middle. In this video, we're going to discuss the algorithm for computing the median of a data set and we're going to take the traditional approach of sorting the values first and then selecting the values we need in order to compute the median. We're going to be testing the circumstances under which the median function should behave and then we're going to compute the median of our 2015 away team runs using our prototyped function.
When we last left our Haskell notebook, we were discussing the mean and standard deviation of runs and we found that one standard deviation range was 1.03 to 7.27. Now what I would like to do at the very beginning of our notebook is to add yet another import and we're going to import data.list. This is where we find the sort function and I'm going to, as usual, restart and rerun all so that everything is properly loaded for our notebook.
Great, I'm going to create a couple of quick lists just to demonstrate the sort function. So here we have odd lists, this is three four one two five. Great, now let's do even lists and even list is going to be six five four three two one. We can use the sort function to sort these lists. Pretty straight forward, that's found in the data.list library.
If I wish to find a middle value of a list, I need to find the length of a list and then divide by two. So I can use length odd list and then give two and it produces two. Good, now I can sort that odd list and pull out the second element and I get three. And three is the median of our odd list. And for an odd list, that's all you have to do. You should notice that whenever we past an even list, we get the index position that appears after the median so if I say length even list, give two, I get three.
And the index position for three in our sorted even list will be four, which is not the median. So we need to take the two values that are closest to the middle, which in this case, it will be index three, and then the index position before that, which is two, and then add those together and divide by two. So the formula comes out to being sort even list three plus sort even list two, believe my parenthesis were off there, and then I divide by two.
Then I see that our median is 3.5, which is the true median of our even list. There are algorithms for finding the median that do no require the full sort of values. Such as, you can use the quick select algorithm to quickly find the median sorted value in the list. But for our purposes, we're going to stay with the traditional sort the values first approach. We're going to prototype a median function utilizing the approach that we've outlined here. We're going to go over a few quick examples of what should happen whenever median is called.
So here is our median prototype function. Notice that we are bounding our inputs based on type reel and we are packaging once again a double inside of a maybe. We're using double because there's the possibility that even though we have a full list of integers, we still need to return a double because we have an even number of integers. If we have a median of no items then we return nothing, other than that, we are going to have the possibility of an odd list then we wil return the middle value.
Otherwise, we are going to return the middle even. And here we have all of the different circumstances outlined. So what should happen whenever we return the median of an empty list? We get nothing. Likewise, if we get the median odd list, we should get back three. Notice it's been converted to a double and if we do the median of an even list, we get 3.5 and to outline again, we have our middle value which is just the middle index.
And we have the before middle value which is the middle index minus one. And the middle even is simply those two values divided by two. And that's all there really is to it. We're using the odd function in order to look for an odd number of elements, otherwise, we're going to use the even approach. So, using sort, we built a function for finding the median of a list. So this was a long function and we described it in detail.
So, finally, we need to use the median function which we have prototyped in order to find the away runs. But we found that the middle sorted value of away runs in the 2015 season is four. In our next video, we are going to discuss what is probably the simplest of the descriptive statistics to discuss and that is the mode, but it turns out to be one of the more difficult to compute.
Note: This course was created by Packt Publishing. We are pleased to host this training in our library.
- Data ranges, means, and medians
- Standard deviation
- SQLite3 command line
- Slices of data
- Regular expressions
- Atoms and modifiers
- Character classes
- Line plots of a single variable
- Plotting a moving average
- Feature scaling
- Scatter plots
- Normal distribution
- Kernel density estimation (KDE)