New Feature: Playlist Center! Pick a topic and let our playlists guide the way.

Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member

Generating dot plots

From: Interactive Data Visualization with Processing

Video: Generating dot plots

One of the most basic forms of statistical graphics, or data visualization, that you can put together is a dot plot. And all it is a single dimension, put a dot where a person's or unit's score falls on that particular dimension. Now this is a very simple thing to do in Processing. All you need to do is do ellipses and have either the X or the Y be driven by the data. One the other hand, I thought I would try to show you something slightly more interesting than that, and what I've done is I've created a dot plot using the Google data file that I used in the last chapter.

Generating dot plots

One of the most basic forms of statistical graphics, or data visualization, that you can put together is a dot plot. And all it is a single dimension, put a dot where a person's or unit's score falls on that particular dimension. Now this is a very simple thing to do in Processing. All you need to do is do ellipses and have either the X or the Y be driven by the data. One the other hand, I thought I would try to show you something slightly more interesting than that, and what I've done is I've created a dot plot using the Google data file that I used in the last chapter.

If you open up the folder for chapter 11, exercise one, what you'll find is a few things in there. First off, you find the actual Processing file itself. That's one that says Ex11_01. That's the file we are looking at right here. You will also find the Table, with a capital T. That is the table class that we will be using to help process the data, and you can see for instance, in the Processing sketch on the left, that we have two tabs. The second one is for the table. You also see the data folder at the top and if I double-click on that, we've got three files in there. Two of them we have seen before.

There is the stateData, and that's information I got from Google about different states and their relative interest in particular search terms. Now, I have the Excel spreadsheet in here because it also has all the original header data and some of the summary statistics that are helpful in getting things set up. On the other hand, Processing is much happier with the TSV, tab-Separated values, spreadsheet. That's a text form of a spreadsheet. I use that, but it doesn't have any header information, and it's a little harder for me to work with, so I just have in the bottom in there. The Excel spreadsheet is simply for my reference.

The third file you see in there is a font. I have created a font for use in this particular sketch, and that gets stored in the data file. So let me show you what is in this particular sketch and how it works. First off I brought in a color palette, like I have done many times before. Also, I have created a font. I've called it the labelFont because I use it for the labels. Below that I have Table stateData, and that is the class is table and the object that I am creating out of the class is called stateData, and that's what I'm going to be using to read information in.

Then I have an integer variable rowCount. That's where we are simply counting how many rows there are in the spreadsheet. I also have another variable, d, which is for the diameter of the ellipses that I am going to be drawing, the dots that I am going to be drawing. Let me scroll down a little bit. Then in the setup block, I have a window that's 600 by 200 pixels. And then I read in the data. I have the stateData object and from that I say it's new object in the table class, and it's going to read the stateData.tsv file.

I am also count the rows in it. So rowCount is equal to stateData.getRowCount. The .getRowCount is a method within that particular object. And then I do print line, just to check how many lines I have at the botto. In fact, you can see it's down there right now, rowCount = 51. That's because I have 50 states and Washington D.C. And the last thing in the setup is I load the font into the font variable. So the font variable I called labelFont. The function is loadFont, and then I put in the name of the font that I created with the Processing tool.

Remember, if I go up to the Tools right here, to Create Font, that's how I created that tool, by selecting Gill Sans Bold and then size 18. Next, in the draw block, we have got a few other things going on. I am going to scroll up a little bit. First, I specify the palette, and I put that in the draw because I wanted to refresh every time through. Then I load the font. So, the text font that I will be using is from the variable label font. I make a stroke that will be used for some lines to mark points on the grid.

I also have a fill that will be used for some of the labels. The text here that I see that I don't need, just comment that out. I am just going to get rid of it. Then I have a for loop, and what this for loop does is it's going to draw some reference lines vertical, because I am going to actually have several dot plots going across the window and this little draw points where I can see what's going on and I put some text labels on them. Maybe I will come back and explain those in a minute. Then when that loop is finished, I have smooth. Turned on the anti-aliasing. I have no stroke. Turned off the circles around the dots.

Then I go through the data file. So what this is is I am opening up this stateData files. So I am going to go through one row at a time. That's what we have at the top. Start at row 0, which is the first line of data, and go one at a time till you get to the bottom. And I am pulling out a few variables. First off, I'm getting the state names. That's a string variable. And that I started the top row and that's in column 0. It's the very first column, so the state names are right down there. And that's why it's using getString, because that's a string variable.

Then what I'm doing is I'm getting data about relative interest in four different sports as Google search terms. I am getting NFL, football; NBA, basketball; MLB, for baseball--and those ones we actually searched for the abbreviation--I also have major league soccer. We actually searched for that phrase, Major league Soccer, but in here I'm abbreviating it as MLS. And for each one of these, what I do as I have Processing go through, row by row, and pull out the values for that particular variable.

So you see, for instance, under NFL, it says float nfl, and that's because the float number here is the index number of relative interest in the Google search. If a state's relative interest is exactly on the average, they have 0. If they are above the average, they have a positive number. If they are below, they have a negative number. And so what this is going to do is it is going to go the ninth row--actually, that's the index number. It's really the tenth row, because it's set to 0. So it goes to the tenth row of the spreadsheet and then it gets the number there.

And then to make it fit in the window, I do a few different things. Because the range for these numbers goes from approximately -2 to positive 5, what I do is I add 2 to make everything a positive value, and that gets me from 0 to 7. Then I multiply it times 65, which approximately fills up the width of the window, and I add 100 to move it all over, to give me enough room to add some labels on the left side. So that's what that formula there right is. It's an attempt to take the natural range of the Google numbers and move it over and spread it out enough to fill up the entire window.

Next, I go to the palette and I pick a number out for the dots in the NFL, and then I have the ellipse, and that's what actually draws the dots. It uses the number for NFL, and that's the Google search terms. It's the actual outcome variable. And then I have the Y position. I am putting this 2/10th of the way down. And then the ellipse is d. That's 10 pixels tall and 10 pixels wide. And then what I do is I have a label that I put next to it. It's text, and I put it in quotes because I want it to actually say that word NFL. And that's going to be 60 pixels over, and it's going to come down just a little bit farther than the circles, because the circles are positioned by the midpoint, but the text gets positioned by it's baseline, and so this lines the baseline up with the bottom of circles.

Finally, I have a little if statement, and what this does is allows me to do a mouse- over and if the mouse is over a particular dot, it will tell me what state that dot is. It gets muddled up if the states are close, but at least you can use it to see the extremely high and low numbers. And then I do a similar procedure for each of the other sports. I just go to a different row in the data. So instead of going, for instance, to the row with index 9, I go to index 10, and then I use a different color for the palette; instead of index 1, I use index 2. And then I bumped the ellipses down another 2/10ths of the window.

And just make these appropriate accommodations all the way through, then I do the exact same thing for MLB and the exact same thing for MLS, Major League Soccer. Now, I'm confident that there is a more compact way to do this, through arrays, but I think that for right now, for a pedagogical purpose, this is really nice because it spells every step out in detail and shows the repetition as I go through. I am just going to save this, and I am going to hit Run. That's Ctrl+R on the PC and Command+R on the Mac. And here is what we get.

I get a window, a light-yellow background, and you can see that I have reference lines going vertically across four different dot plots: NFL on the top, then MBA, then Major League Baseball, then Major League Soccer. The third line, vertical line from the left, is 0, and that means that's the national average. Any state that's on that has about as much relative interest in that as a search term as everybody else. So, for instance, we see that in Major League Baseball, there is one dot that's almost exactly on the zero. And in fact, what I can do is I can bring the mouse in, and I can hover over, and that's North Dakota. And then what I can do is I can look at the other states, and you see most of them don't go down really low. One state has less interest in basketball than the others. That's Montana.

They don't have a basketball team. On the other hand, I happen to know that the highest state-- well, that's Utah. That's where I am from. We actually do have a basketball team. But you do see some of the others. For instance, the state with the highest relative interest in NFL is South Dakota. They don't have a football team. But you can see the interest. Wisconsin, North Dakota, Maryland. And if I get in here, it can just kind of gets jumbled up, and there is--there are ways of dealing with that, but I'm not going to do those right now. I do point out this last one, Major League Soccer.

You see that we've got everybody's piled really close right here. You can't even tell who is who. But you have stragglers. So there's Oregon. And by the way, if you know statistics, these are standard deviation units. This means that Oregon's relative interest in searching for Major League Soccer is one standard deviation above the national average, which is a fair amount. And then there is Washington. There 1 1/2 standard deviations above the national average. And then way up here, the highest thing we have of any of these variables is Utah, which has over 3 and a half standard deviations above the mean, which, statistically, is an extraordinary thing.

I will point out that Real Salt Lake, our Major League Soccer team, won the Major League Soccer cup a few years ago. But otherwise, I don't quite know how to explain that, but I'm just happy for it. And in this case, this is a dot plot; this is a simple data visualization. I made it a little more interesting by altering the colors, by throwing in several dot plots at once, providing some easy reference points, and by making it possible to do a hover, to get some detailed information about the extreme points, and that's the first kind of visualization that we want to cover.

Show transcript

This video is part of

Image for Interactive Data Visualization with Processing
Interactive Data Visualization with Processing

72 video lessons · 11885 viewers

Barton Poulson
Author

 
Expand all | Collapse all
  1. 3m 16s
    1. Welcome
      58s
    2. What you should know
      1m 22s
    3. Using the exercise files
      56s
  2. 11m 51s
    1. Overview of data visualization
      11m 51s
  3. 11m 53s
    1. Installing Processing
      3m 38s
    2. Overview of Processing
      4m 5s
    3. Exploring libraries
      4m 10s
  4. 1h 1m
    1. Basic setup
      7m 31s
    2. Drawing points
      4m 37s
    3. Drawing lines
      5m 6s
    4. Drawing ellipses and circles
      5m 24s
    5. Drawing arcs
      6m 54s
    6. Drawing rectangles and squares
      4m 58s
    7. Drawing quadrangles
      3m 25s
    8. Drawing triangles
      2m 55s
    9. Drawing polygons
      3m 37s
    10. Drawing simple curves
      4m 54s
    11. Drawing complex curves
      6m 46s
    12. Drawing Bézier curves
      5m 38s
  5. 54m 3s
    1. Introduction to variables
      10m 44s
    2. Understanding variable scope
      6m 53s
    3. Modifying variables
      9m 8s
    4. Creating arrays
      9m 53s
    5. Modifying arrays
      6m 37s
    6. Creating strings
      7m 3s
    7. Modifying strings
      3m 45s
  6. 1h 2m
    1. Incorporating randomness
      7m 59s
    2. Using Perlin noise
      4m 24s
    3. Shuffling with Java
      3m 30s
    4. Specifying line attributes
      8m 2s
    5. Changing placement modes
      5m 45s
    6. Understanding color attributes and functions
      4m 16s
    7. Exploring color spaces
      7m 44s
    8. Using color palettes
      7m 5s
    9. Transforming the grid
      8m 38s
    10. Exploring the attribute matrix
      5m 33s
  7. 52m 7s
    1. Building code blocks
      5m 57s
    2. Writing a while loop
      3m 52s
    3. Using for loops
      5m 35s
    4. Creating conditionals
      14m 50s
    5. Working with easing
      10m 51s
    6. Creating spirals
      11m 2s
  8. 18m 55s
    1. Mouse tracking
      3m 54s
    2. Hovering and clicking
      11m 16s
    3. Understanding keyboard interaction
      3m 45s
  9. 27m 32s
    1. Specifying fonts
      6m 43s
    2. Using images
      5m 51s
    3. Playing a video loop
      6m 20s
    4. Exporting video
      3m 47s
    5. Adding sound
      4m 51s
  10. 20m 49s
    1. Creating functions
      11m 48s
    2. Creating classes and objects
      9m 1s
  11. 31m 10s
    1. Using embedded data
      5m 26s
    2. Working with appended text data
      6m 4s
    3. Working with appended tabular data
      10m 26s
    4. Reading XML data
      9m 14s
  12. 48m 17s
    1. Generating dot plots
      11m 11s
    2. Building scatter plots
      10m 0s
    3. Making line plots
      9m 55s
    4. Creating bar charts
      9m 12s
    5. Checking out examples of maps, hierarchies, and networks
      7m 59s
  13. 20m 57s
    1. Introducing some principles of 2D design
      13m 44s
    2. Understanding color theory
      7m 13s
  14. 24m 46s
    1. Interacting with zooming, rotating, and sliding
      6m 26s
    2. Implementing slicing
      6m 47s
    3. Using rollovers
      5m 58s
    4. Introducing the GUI libraries
      5m 35s
  15. 10m 35s
    1. Sharing via OpenProcessing and other sites
      3m 19s
    2. Saving as a desktop application
      2m 42s
    3. Saving as JavaScript
      1m 47s
    4. Saving as an Android application
      2m 47s
  16. 2m 38s
    1. Where to go from here
      2m 38s

Start learning today

Get unlimited access to all courses for just $25/month.

Become a member
Sometimes @lynda teaches me how to use a program and sometimes Lynda.com changes my life forever. @JosefShutter
@lynda lynda.com is an absolute life saver when it comes to learning todays software. Definitely recommend it! #higherlearning @Michael_Caraway
@lynda The best thing online! Your database of courses is great! To the mark and very helpful. Thanks! @ru22more
Got to create something yesterday I never thought I could do. #thanks @lynda @Ngventurella
I really do love @lynda as a learning platform. Never stop learning and developing, it’s probably our greatest gift as a species! @soundslikedavid
@lynda just subscribed to lynda.com all I can say its brilliant join now trust me @ButchSamurai
@lynda is an awesome resource. The membership is priceless if you take advantage of it. @diabetic_techie
One of the best decision I made this year. Buy a 1yr subscription to @lynda @cybercaptive
guys lynda.com (@lynda) is the best. So far I’ve learned Java, principles of OO programming, and now learning about MS project @lucasmitchell
Signed back up to @lynda dot com. I’ve missed it!! Proper geeking out right now! #timetolearn #geek @JayGodbold

Are you sure you want to delete this note?

No

Thanks for signing up.

We’ll send you a confirmation email shortly.


Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

Keep up with news, tips, and latest courses with emails from lynda.com.

Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

   
submit Lightbox submit clicked
Terms and conditions of use

We've updated our terms and conditions (now called terms of service).Go
Review and accept our updated terms of service.