When you work with scientific survey data, you might be interested in discovering how many values fall into certain categories. Learn how you can visually summarize that type of data by creating a histogram.
- [Man] When you work with scientific or survey data, you might be interested in discovering how many values fall into certain categories. For example, you might want to know how many values you have between one and five, between six and ten, from 11 to 15 and so on. You can usually summarize that type of data by creating a histogram. In this movie, I will show you how to do that in Mathematica 11. It's easier to demonstrate and explain histograms by creating one, so I'll start by defining a data set to work with.
I will call it B List. And equal sign. It's a list of values, so I will begin the enumeration with a left curly bracket. Then, I'll type in the values separated by commas. So, 1 comma 14 comma 19 comma 3 comma 7 comma 22 comma 16 comma 14 and follow that with a right curly bracket. There's several things I'd like you to note about this data list. The minimum value is one. The maximum is 22.
And the only duplicate value, which has a count of two, is 14. So, I'll press shift, enter. Everything looks good and now I can create a histogram based on the data. The keyboard, as you might expect, is histogram. So, capital H, I-S-T-O-G-R-A-M followed by a left square bracket and then the name of the variable that contains my list. So, I'll type B list, followed by right square bracket and shift enter. The resulting graph displays the number values that fall into several bins.
The first bin goes from zero to 10. You can see the values along the X or horizontal axis. On the Y axis, there's a count of the number values in that bin. According to this histogram, there are three and if we look back up at the original data, we'll see that we have values of one, three, and seven. So, that's correct. Between 10 and 20, we have the values of 14, 19, 16, and 14 again. So, we have four individual occurrences there and there's only one value greater than 20, which is 22.
So, our summary is correct. Please note, though, that the bins are of size 10, which means that the maximum of 30 is much larger than the maximum value of 22. You can set the maximum value for your histogram as an argument when you enter in the variable name. So, I will type histogram again followed by a left square bracket, then B list, same as before. Then, a comma and the number 22, which is our maximum.
Type a right square bracket and shift enter and we get the results. In this case, Mathematica has created smaller bins. So, we see values for 14, which is here in the 15 bin. Then we have one value, which is 16. Then the next, which is 19 and so on. To give you an idea of how histograms work with larger sets of data, let me create a number of random variables and show you how it works with a histogram there.
So, I'll type histogram. Followed by a left square bracket and let's say that I want to generate a set of 300 values from a normal distribution with a mean of zero and a standard deviation of three. To do that in Mathematica, I'll type random variate, followed by a left square bracket. Now, I need to name the distribution that the values will come from. So, that's the normal distribution.
Now, another left square bracket and I need to enter the mean and the standard deviation, which is what defines a normal distribution. So, that's zero comma three followed by a right square bracket and a comma. And now the number of values that I want, so we'll do 300. I need to enter in two more right square brackets to close out the argument list of my nested functions. Then, shift enter and we get the histogram. I'll scroll down a bit so we can take a look at it. What I see is that we have a number of variable counts close to zero, which is the average, that are much higher than those toward the tails of the distribution.
So, you can see that from zero to about negative two we have 75 observations and from zero to plus two or plus 1.67, we have 71 observations. If we took all of these values here and added them up for the count in each of the bins, they would add up to 300, which is the number of values that we created in the beginning. As I said, histograms are a great way to summarize your data. If you're interested in the number of values that fall within certain ranges, then histograms are the best way to make a summary.
- Managing Mathematica notebooks
- Surveying basic Mathematica commands
- Manipulating lists
- Analyzing data using descriptive statistics
- Manipulating matrices
- Managing executable Mathematica scripts
- Visualizing and formatting data
- Creating interactive and animated visualizations