Start learning with our library of video tutorials taught by experts. Get started
Viewers: in countries Watching now:
Start communicating ideas and diagramming data in a more interactive way. In this course, author Barton Poulson shows how to read, map, and illustrate data with Processing, an open-source drawing and development environment. On top of a solid introduction to Processing itself, this course investigates methods for obtaining and preparing data, designing for data visualization, and building an interactive experience out of a design. When your visualization is complete, explore the options for sharing your work, whether uploading it to specialized websites, embedding the visualizations in your own web pages, or even creating a desktop or Android app for your work.
In our last movie, we looked at how we could create dot plots to represent one-dimensional distributions, and what that is is one variable at a time just running from lowest to highest. In the example that I had, I actually showed several variables, but they were all just separate, one-dimensional distributions. In this movie, I want to show you how to do a scatter plot, a very common two-dimensional distribution for two quantitative variables. And in this case I'm going to be using a lot of the same information. We are using the same data set, which is based on Google's search trends, on a state-by-state basis.
Because this is a rather lengthy code, I am just going to walk through this one instead of typing in in front of you. Let me just show you, starting at the top, first, I've got to put that as a comment or I am going to get an error message. So I'll save that. I have my color palette that I brought in, and then I've saved some fonts. I created the fonts with the tool in Processing. If I go up to Tools here in the menu bar and click on that, down to Create Font, this is where I created the font. Now on your computer the fonts available, may be different.
We've got Adobe CS6 installed on this one, so we probably have some fonts that most computers won't have. If you have any problem with the fonts, also just feel free to comment out the fonts that I have got in this example, or just replace them with other ones that you do have on your computer system. But in the third line, I am declaring the font variable. Actually, it's an object. Beneath that, I am declaring the data. That's a data object. Below that, I have a variable for the rowCount in the data set.
Then I have a little global variable for the diameter of the circles that I'll be using. In the setup block--I'll scoot up a little bit-- I have a window that's 600 x 500, because normally, I only do 200 tall in this course, but a scatter plot needs to have enough vertical room to work well. So I've made it nearly square. Then I am calling the data. I'm loading it into the stateData object, by referring to the stateData.tsv, tab-separated value file.
Then I'm also getting the rowCount by referring to one of the object methods, the getRowCount. Then I'm going to print out the rowCount just to double-check down in the Console. Then I load the font into labelFont. And then I have the anti-aliasing turned on. Then we go down to draw. I have got a background which is based on the first index color in my palette, the array of colors. Then I call on the font, and I load the stroke and the fill for the shapes in a medium gray.
And then I start placing some axes. Now, remember, when you're drawing in Processing, you want to do things sort of in a reverse order, because the things that come later will get drawn on top of the things that come earlier. So, a lot of times you want to put the foundational stuff very first and then other things later. So right now, I'm just doing the lines and the labels for the X axis. So I'm going to do textAlign (CENTER). I'm drawing a line across the bottom. And then this little for loop is going to insert the labels across the X axis.
You'll see this when we bring them up. And then I'm also calling on the text, just to put the word Videogames underneath that, because what I'm looking at in this one is the relative interest in Google searches for the term videogames. Below that, I do a similar thing for the Y axis. Let me roll this up a little bit. I've changed the alignment because I want them to be a little snug on the right to the axis. I draw the vertical line for the axis, and then I use another for loop that places the markers and the numbers along the side.
And then I have the word dance that appears to the side of the whole thing, because on the Y axis we're depicting the state's relative interest in the search term "dance" in Google. Beneath that, we have a for loop within the draw loop that loads the data. So it goes through the data set, one row at a time, and first it loads the state names into a string variable called state. Then it loads the video game data. It says float, because it's a floating point variable.
It has decimal places. Then we have the name of the variable, videoGames, and then it gets it from the stateData object using the .getFloat method for the object. And then it just says go one row at a time, because we're using the row variable from above, and then it is in the fifth row. It's actually in the fifth index, so it starts at zero. And then I tell it to map it. Now, you may recall that in the last example when I did the dot plots, I did some rather Byzantine calculations on how to get things spaced out correctly.
That was one way to do it. Another way to do it that actually can make things a lot simpler is to use Processing's Map function, and what this does is it changes the scale between two things. So on this one, I said I created a new variable called x because I am using x coordinates. And I said I want you to map the videoGames variable, which has a naturally occurring range of about -3 to +3 in this data set, and I want you to change that so it goes from 100 to 555, because that's a multiple of the labels on the bottom.
So I don't have to figure out that a score of 1.477=417 or something. This will just do that automatically for me. It's even better on the next one, dance, because I'm dealing with axes that go in different directions. Because when you do a scatter plot, you want to start with zero at the bottom. Now we want the numbers to get bigger as you go up. The problem is, however, in a computer, zero is at the top and the numbers get bigger as it goes down, and by using the Map function, I'm able to flip that around without any math on my part.
So I say I have a new variable, a floating variable called y, and then I'm using Map to take the dance variable and to take its naturally occurring values of -3 to +4 and map those. It starts at 400 and then goes to 50. I am actually trying to even flip the order in which things go, but this one makes life much easier than having to figure out by hand what the adjustment should be for the calculations. After that, I turn off the stroke and then I put in a fill color for the ellipses.
I also make them somewhat transparent. That's the 180. There is the alpha. Because I have an x variable, a y variable, and a d variable for the diameter up above, I just put x, y, d, d for the eclipses. Then what I have is a small amount of text that enables rollovers, to see what the states are for the data points. Now, it gets a little crowded in the middle, but this does work well for extreme cases. And then on this one, what I have used is Processing's Distance function. And what I have said is if the distance between the midpoint of a dot--because this is going through one dot at a time--if the difference between a dot and the mouseX and mouseY is less than the half of the diameter of the dot, the diameter is 10/2 gets 5, I added one on so we have just a little bit extra room for that one, because sometimes five pixels can be hard to hit.
And then I say but if the mouse is that close to the center of a dot, then bring up the state name and put it just off to the side. Also, you see I have a second tab. We have used this one before. That has Ben Fry's Table class. Rather lengthy here. Again, I'll just point out that you can copy it from here, but you can also find the Table class in the Processing's built-in examples. So I show you really quickly. We go File, to Examples, click that Open and then go to Books.
That's the fourth one down. Go to Visualizing Data, the third one down, and then just go to usmap, the first one. It says chapter 03 usmap (ch03-usmap). And then the second one, starting here, when you click that open, you will find the same Table class information. So you can also get it from there. So it's built into the Processing program as well. And we fully anticipate that with version 2.0 of Processing this will be a native part of Processing that you won't require any special installation.
So anyhow, back to where we were, We hit Run and there is our scatter plot. You see we have Videogames scores across the bottom. 0 indicates that that state's relative interest in that search term is at the national average. If they have a positive number, it means they're above the national average. If they have a negative number, they are below. Similarly, Dance going up and down the side, zero is at the national average, positive numbers are higher, And we have a few interesting cases right here. We got a bunch of cases just right here in the middle.
There's Montana, there is Missouri, there's Texas. Nothing particularly special. They are close to the national average on both of them. We do have one right here. Iowa is the highest of all in searching for Videogames, and they are at the national average for Dance. Down at the very bottom-left is Virginia, which is three standard deviations below the mean on both of them. And I don't have any explanation; if anybody here is from Virginia, I'd love to hear what your theories on that one are. And then way up at the tippy top is Utah, my home state.
You see there's a huge difference between them and everybody else on search for Dance terms; in fact there is over two standard deviations in between them and anybody else. I have some theories about that. Performing arts are very popular in Utah. But anyhow, it's an interesting thing, and this is a very easy form of interaction. All it is is a rollover. Don't even have to do anything. And that is how you can create an interactive scatter plot in Processing.
There are currently no FAQs about Interactive Data Visualization with Processing.
Access exercise files from a button right under the course name.
Search within course videos and transcripts, and jump right to the results.
Remove icons showing you already watched videos if you want to start over.
Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.