Ready to watch this entire course?
Become a member and get unlimited access to the entire skills library of over 4,900 courses, including more Business and personalized recommendations.
Start Your Free Trial Now Overview
 Transcript
 View Offline
Released
12/8/2010 Understanding statistical terms
 Creating a basic Excel table
 Auditing formulas
 Creating frequency distributions for qualitative data
 Calculating a running total
 Creating a histogram
 Using PivotTables
 Calculating mean, median, mode, and other numerical data
 Using probability distributions
 Population sampling
 Testing hypotheses
 Developing liner and multiple regression models
Skill Level Intermediate
Duration
Views






When you analyze a data set, you can often gather a great deal of information about the data by finding its central values. In this movie, I'll show you how to calculate three different types of central values: the mean, the median, and the mode. The common word for the arithmetic mean of a data set is the average. It's the sum of the values divided by the number values. So, for example, in this table here, I have seven days and then the sales totals for those days. If I want find the average in cell B1, I can type "=average(" and then the name of the table, which is DailySales. So I will start typing it and then press Tab to accept the formula AutoComplete value.
Now I want the Sales column, so I'll type a left square bracket, start typing Sales, Sales is highlighted, Tab to accept it, right square bracket, and then a right parenthesis to close the formula. When I press Enter, the value appears: 3,825.9. Averages can be misleading if you have one or two disproportionately large values in a data set, such as days where customers made unusually big orders. So let's say, for example, that on March 1, instead of having $5,194 in sales that a customer had ordered enough to push the total up to $20,000.
When you press Enter, you see that the mean has gone up to almost $6,000, when all of the other values except for that one 20,000 are still in the $3,000$4,000 range for the most part. So always be careful when you're examining your data to make sure that you don't include any really large values, anything that doesn't seem to fit with the rest of the data pattern. Now if those values occur a couple of times, in your estimation, well it's not unusual to have days where there are huge orders, then certainly include them; that's part of the business knowledge that you have about your operation. But if it's just one instance, then don't worry about it; you should exclude it.
So I guess the question is how do you do that? Well you do that by filtering your value out of the table. So, for example, if you want to create a filter, you could click the Sales column's Filter arrow, click Number Filters that are Less Than, and let's say 10,000 as a reasonable upper bound. When we click OK, Excel filters the table, but you'll notice that the mean didn't change; it still looks at all the values. What we can do, however, is add a total row at the bottom of the table, and that will display the average of only the visible values.
To do that, you click any cell in the table, and then on the Design Contextual tab, check the Total Row box. When you do that, it displays a total, but if you click the cell in the Total Row, click the down arrow, and then click the AVERAGE function, Excel displays a value of 3,598, which is the average of only the visible values. If you remove the filter, by clicking the Filter arrow and then clicking Clear Filter from Sales, then Excel's AVERAGE function here in the Total Row gives the same answer as the formula in cell B1.
Next I would like to talk about the median value. The median value of a data set is the value in the middle of the values if they were sorted in ascending order that is, from lowest to highest. To discover the median in the data set, you use the MEDIAN function, so you type "=MEDIAN(" and then the name of the table, which is CustomerCount, and we are using the data in the Customers field, or column. Press tab to accept the field name. Right square bracket. Right parenthesis.
Everything looks right. And we get the median value of 116. Now the values aren't sorted in the table by customers, so to do thatto make sure that 116 is the proper valueI will sort these values from smallest to largest. And you see that the value that appears in the middle is 116, which is the median. The median value is not affected by extremely large or extremely small values. So, for example, if I were to change the value 130 to 500, or even 5,000, and press Return, the median doesn't change because 116 is still in the middle.
The reason that happened is because 5,000 is still in the same half, if you were, of the data set. So we have 49, 51, to 74 that are all less than 116, and 123, 126, and 5,000 that are all greater than 116. If I were to change 5,000 to 1 and press Return, it would move to a different half of the data set, and you'll notice that 74 is the new median, which we can verify by resorting the data from smallest to largest. And you see that 74 is now in the median position.
Finally, I would like to talk about the mode. The mode of a data collection is the most common value. In Excel 2007, if multiple values are tied for the highest frequency, the program picks the lowest value as the mode. So, for example, in the data set here, if I want to find the most common value in the Customers column, type "=MODE(" and then the name of the table, which is "HourlyCustomers," and we're looking in the Customers column, using that data. Right square bracket. Right parenthesis to close. Return. And the MODE formula returns a value 28.
And if we sort this data like we did last time, sorting from smallest to largest, then we'll see that indeed the number 28 occurs three separate times. The number 14 occurs twice. And because it's lower than 28, if I were to change one of the 28s to another value that doesn't occurlet's say 30, which doesn't appear anywhere elseand hit Return, then the mode sees that there are two occurrences of 28, two occurrences of 14, and using its own rules, it selects the lower value, 14, as the mode.
Calculating modes is only useful for what's called "discrete data," where the values are whole numbers such as 10, 25, or 50. Continuous data, where the measurements could have decimal components, either won't have the most common value or there will be a duplication that occurs entirely by chance. Most of these statistical techniques you will use in this course rely heavily on the mean, or average, value of a data set. And unless your data set contains one or two extremely large or extremely small values that distort the average, it's the most useful value.
If you're interested in the value that would be in the middle of a range if you sorted the values in ascending or descending order, calculate the median. Calculating the mode, the most common value in a data set, could point out the most common answer in a survey and help you discover information about your customers and their opinions about your business.






Public Link
Video: Calculating mean, median, and mode