Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member
One of the most basic forms of statistical graphics, or data visualization, that you can put together is a dot plot. And all it is a single dimension, put a dot where a person's or unit's score falls on that particular dimension. Now this is a very simple thing to do in Processing. All you need to do is do ellipses and have either the X or the Y be driven by the data. One the other hand, I thought I would try to show you something slightly more interesting than that, and what I've done is I've created a dot plot using the Google data file that I used in the last chapter.
If you open up the folder for chapter 11, exercise one, what you'll find is a few things in there. First off, you find the actual Processing file itself. That's one that says Ex11_01. That's the file we are looking at right here. You will also find the Table, with a capital T. That is the table class that we will be using to help process the data, and you can see for instance, in the Processing sketch on the left, that we have two tabs. The second one is for the table. You also see the data folder at the top and if I double-click on that, we've got three files in there. Two of them we have seen before.
There is the stateData, and that's information I got from Google about different states and their relative interest in particular search terms. Now, I have the Excel spreadsheet in here because it also has all the original header data and some of the summary statistics that are helpful in getting things set up. On the other hand, Processing is much happier with the TSV, tab-Separated values, spreadsheet. That's a text form of a spreadsheet. I use that, but it doesn't have any header information, and it's a little harder for me to work with, so I just have in the bottom in there. The Excel spreadsheet is simply for my reference.
The third file you see in there is a font. I have created a font for use in this particular sketch, and that gets stored in the data file. So let me show you what is in this particular sketch and how it works. First off I brought in a color palette, like I have done many times before. Also, I have created a font. I've called it the labelFont because I use it for the labels. Below that I have Table stateData, and that is the class is table and the object that I am creating out of the class is called stateData, and that's what I'm going to be using to read information in.
Then I have an integer variable rowCount. That's where we are simply counting how many rows there are in the spreadsheet. I also have another variable, d, which is for the diameter of the ellipses that I am going to be drawing, the dots that I am going to be drawing. Let me scroll down a little bit. Then in the setup block, I have a window that's 600 by 200 pixels. And then I read in the data. I have the stateData object and from that I say it's new object in the table class, and it's going to read the stateData.tsv file.
I am also count the rows in it. So rowCount is equal to stateData.getRowCount. The .getRowCount is a method within that particular object. And then I do print line, just to check how many lines I have at the botto. In fact, you can see it's down there right now, rowCount = 51. That's because I have 50 states and Washington D.C. And the last thing in the setup is I load the font into the font variable. So the font variable I called labelFont. The function is loadFont, and then I put in the name of the font that I created with the Processing tool.
Remember, if I go up to the Tools right here, to Create Font, that's how I created that tool, by selecting Gill Sans Bold and then size 18. Next, in the draw block, we have got a few other things going on. I am going to scroll up a little bit. First, I specify the palette, and I put that in the draw because I wanted to refresh every time through. Then I load the font. So, the text font that I will be using is from the variable label font. I make a stroke that will be used for some lines to mark points on the grid.
I also have a fill that will be used for some of the labels. The text here that I see that I don't need, just comment that out. I am just going to get rid of it. Then I have a for loop, and what this for loop does is it's going to draw some reference lines vertical, because I am going to actually have several dot plots going across the window and this little draw points where I can see what's going on and I put some text labels on them. Maybe I will come back and explain those in a minute. Then when that loop is finished, I have smooth. Turned on the anti-aliasing. I have no stroke. Turned off the circles around the dots.
Then I go through the data file. So what this is is I am opening up this stateData files. So I am going to go through one row at a time. That's what we have at the top. Start at row 0, which is the first line of data, and go one at a time till you get to the bottom. And I am pulling out a few variables. First off, I'm getting the state names. That's a string variable. And that I started the top row and that's in column 0. It's the very first column, so the state names are right down there. And that's why it's using getString, because that's a string variable.
Then what I'm doing is I'm getting data about relative interest in four different sports as Google search terms. I am getting NFL, football; NBA, basketball; MLB, for baseball--and those ones we actually searched for the abbreviation--I also have major league soccer. We actually searched for that phrase, Major league Soccer, but in here I'm abbreviating it as MLS. And for each one of these, what I do as I have Processing go through, row by row, and pull out the values for that particular variable.
So you see, for instance, under NFL, it says float nfl, and that's because the float number here is the index number of relative interest in the Google search. If a state's relative interest is exactly on the average, they have 0. If they are above the average, they have a positive number. If they are below, they have a negative number. And so what this is going to do is it is going to go the ninth row--actually, that's the index number. It's really the tenth row, because it's set to 0. So it goes to the tenth row of the spreadsheet and then it gets the number there.
And then to make it fit in the window, I do a few different things. Because the range for these numbers goes from approximately -2 to positive 5, what I do is I add 2 to make everything a positive value, and that gets me from 0 to 7. Then I multiply it times 65, which approximately fills up the width of the window, and I add 100 to move it all over, to give me enough room to add some labels on the left side. So that's what that formula there right is. It's an attempt to take the natural range of the Google numbers and move it over and spread it out enough to fill up the entire window.
Next, I go to the palette and I pick a number out for the dots in the NFL, and then I have the ellipse, and that's what actually draws the dots. It uses the number for NFL, and that's the Google search terms. It's the actual outcome variable. And then I have the Y position. I am putting this 2/10th of the way down. And then the ellipse is d. That's 10 pixels tall and 10 pixels wide. And then what I do is I have a label that I put next to it. It's text, and I put it in quotes because I want it to actually say that word NFL. And that's going to be 60 pixels over, and it's going to come down just a little bit farther than the circles, because the circles are positioned by the midpoint, but the text gets positioned by it's baseline, and so this lines the baseline up with the bottom of circles.
Finally, I have a little if statement, and what this does is allows me to do a mouse- over and if the mouse is over a particular dot, it will tell me what state that dot is. It gets muddled up if the states are close, but at least you can use it to see the extremely high and low numbers. And then I do a similar procedure for each of the other sports. I just go to a different row in the data. So instead of going, for instance, to the row with index 9, I go to index 10, and then I use a different color for the palette; instead of index 1, I use index 2. And then I bumped the ellipses down another 2/10ths of the window.
And just make these appropriate accommodations all the way through, then I do the exact same thing for MLB and the exact same thing for MLS, Major League Soccer. Now, I'm confident that there is a more compact way to do this, through arrays, but I think that for right now, for a pedagogical purpose, this is really nice because it spells every step out in detail and shows the repetition as I go through. I am just going to save this, and I am going to hit Run. That's Ctrl+R on the PC and Command+R on the Mac. And here is what we get.
I get a window, a light-yellow background, and you can see that I have reference lines going vertically across four different dot plots: NFL on the top, then MBA, then Major League Baseball, then Major League Soccer. The third line, vertical line from the left, is 0, and that means that's the national average. Any state that's on that has about as much relative interest in that as a search term as everybody else. So, for instance, we see that in Major League Baseball, there is one dot that's almost exactly on the zero. And in fact, what I can do is I can bring the mouse in, and I can hover over, and that's North Dakota. And then what I can do is I can look at the other states, and you see most of them don't go down really low. One state has less interest in basketball than the others. That's Montana.
They don't have a basketball team. On the other hand, I happen to know that the highest state-- well, that's Utah. That's where I am from. We actually do have a basketball team. But you do see some of the others. For instance, the state with the highest relative interest in NFL is South Dakota. They don't have a football team. But you can see the interest. Wisconsin, North Dakota, Maryland. And if I get in here, it can just kind of gets jumbled up, and there is--there are ways of dealing with that, but I'm not going to do those right now. I do point out this last one, Major League Soccer.
You see that we've got everybody's piled really close right here. You can't even tell who is who. But you have stragglers. So there's Oregon. And by the way, if you know statistics, these are standard deviation units. This means that Oregon's relative interest in searching for Major League Soccer is one standard deviation above the national average, which is a fair amount. And then there is Washington. There 1 1/2 standard deviations above the national average. And then way up here, the highest thing we have of any of these variables is Utah, which has over 3 and a half standard deviations above the mean, which, statistically, is an extraordinary thing.
I will point out that Real Salt Lake, our Major League Soccer team, won the Major League Soccer cup a few years ago. But otherwise, I don't quite know how to explain that, but I'm just happy for it. And in this case, this is a dot plot; this is a simple data visualization. I made it a little more interesting by altering the colors, by throwing in several dot plots at once, providing some easy reference points, and by making it possible to do a hover, to get some detailed information about the extreme points, and that's the first kind of visualization that we want to cover.
Get unlimited access to all courses for just $25/month.Become a member