Learn how Excel offers intelligent shortcuts and tips based on a selection of data, including conditional formats, visualizations, and calculated fields.
- [Instructor] All right, this next protip is a little bit of a teaser, kind of a sneak preview into some different ways that you can explore your data using something called the Analysis Toolpak. So just like Solver, the Analysis Toolpak is a built-in Excel plugin that's designed to support a range of more advanced and complex data science and statistical tools and methods, things like analysis of variance, ANOVA, covariance, regression, T-tests, histograms, and so on. Now, obviously I don't have time to dive in to each of those specific methods in turn. They each deserve a much deeper, much more comprehensive review, but what I will show you is the scope of options available within this toolpak, as well as a few that we can use right off the bat that are simpler but no less powerful. So in this demo we're going to be looking at a sample of about 5,000 Olympic athletes along with their ages, heights, and weights. And what we can do is drill into our data tab and look for the data analysis button. If you don't see this, it likely means you just need to activate that add-in. So go ahead to file, options, add-ins, and manage your Excel add-ins to activate this tool on your ribbon. And once you click through data analysis you'll see a scrolling list of all of your different options here, some of which are simple, some are quite advanced. Now, for the sake of demonstration we're going to be talking about two options here. First we're going to look at a correlation matrix between our three fields: age, height, and weight. And then from there we're going to generate some descriptive statistics to calculate things like the mean, mode, variance, range, the count, skewness, min and max values, and so on, without having to type a single formula. So that will be a great way to really describe and understand our data with the click of a button. So common-use cases here, just like I showed you, generating those descriptive stats without having to use formulas, and two, exploring or analyzing data using these more advanced data science or statistical methods, like analyzing variance, building predictive models, et cetera. So let's jump into Excel and take this Analysis Toolpak for a test drive. All right, so go ahead and open up your protip workbook, scroll in the table of contents to your purple analytics tips, and we're going to jump into the Analysis Toolpak preview demo here. And when we link out we'll see our list of Olympic athletes. Again, we've got a sample of about 5,000 athletes here. We're looking at ages, heights in centimeters, and weights in kilograms. So this'll be a nice data set to practice some of these Analysis Toolpak options. So first things first, head to your data tab. Off on the right, this is where you should see your activated plugins. If you don't see data analysis, go ahead to file, options, add-ins, and you're going to manage your Excel add-ins, press go. You should see Analysis Toolpak in that list. Go ahead and check the box, and press okay. That should pop up right here. And let's go ahead and see what we have available to us. So as we scroll through we'll see a number of different options here, analysis of variance, correlation, covariance, exponential smoothing, moving average, random numbers, T-tests, Z-tests, and a few others. So obviously kind of a range of level of difficulty and complexity here, some simple tools, some that are quite advanced. In this case we're going to stick to a few basic options that you can utilize right off the bat, starting with descriptive statistics. So let's go ahead and press okay. Here we'll see a pretty intuitive dialogue box. It says all right, what's your input range of values? Well, I'd like to explore or analyze all three of these columns. So let's select B through D. These are organized in columns, so that's good. We do have headers or labels in our first row. And where do we want to drop this output? Well, let's put the upper left corner right here in cell F1. And let's also include the Kth largest and smallest one value, basically a fancy way of saying the max and the min, and I think that should just about do it. We don't need to worry about confidence level for mean. At this point, 95% default is just fine. Let's go ahead and press okay, and look at this. It's going to drop in all of these summary stats right here in columns F through K, literally in a matter of seconds. Now, think about how long that would've taken to produce all of these values with cell formulas, these means, these medians, the standard deviation of variance, would've taken quite some time. And we're able to just produce all of those values plus quite a few more in a matter of seconds. So looking at our max and min, looks like we have an athlete who was 65 years old. He was the oldest. Our youngest athlete was only 12. In terms of height, tallest athlete, 216 centimeters. And for those of you who aren't on the metric system that's about seven feet, one inch. And we can confirm by simply filtering our source data here. Let's sort descending by height. And here we see Andreas Glyniadakis, who is a Greek national basketball player, which shouldn't come as too much of a surprise given how tall he is. And then looking at weights, looks like our largest athlete was 160 kilograms. Our smallest was only 35 kilograms. And same story here, let's sort those weights ascending. And you can see that it's Aiko Sugihara, who is a Japanese gymnast. And again, for those non-metric folks, 35 kilograms is a hair over 77 pounds. So those are our descriptive stats. Really, really helpful, but also quite simple to produce. Let's go ahead and do one more example here. Back into our data analysis options. This time I want to look at correlations or relationships between these three columns. So same input range here, B through D. Okay, they are columns again. We've got labels, headers in our first row. And this time let's drop our correlation output beneath our stats right here in F20. And it's as simple as that. Okay, and there we go. Let's go ahead and format these, so it's a bit more readable, as percentages. And check this out. We've got this correlation matrix that's produced here. You'll see 100% kind of on the diagonal because each field is 100% correlated with itself. But the ones that we really care about here are these two values as well as this one here. So I'm just going to make those bold. And basically what this tells us is that there's not much of a correlation or relationship between ages and heights, only about an 11% correlation, which might seem a little bit surprising at first until you realize that we're not looking at the entire population. We're not looking at heights all the way down to newborns or infants. These are all, well, for the most part, adult athletes. So once you've reached adulthood you're not really growing anymore, so it's not surprising that we see a relatively low correlation there. And that same story holds for age and weight for the same reasons. Now, what's interesting is that we do see a very clear, very strong correlation, 78%, between height and weight, which also makes sense because taller, larger athletes tend to weigh more. So there you go. That's a very quick way to use tools like correlation and descriptive stats to explore and learn more about your data set without having to use any fancy tools or formulas. And again, I won't be diving into the other kind of more complex tools within that Analysis Toolpak, but I'd encourage you to explore those options and try playing with those tools for yourself. So there you have it, quick primer, quick preview, of the Analysis Toolpak.