Join Barton Poulson for an in-depth discussion in this video Overview of data visualization, part of Interactive Data Visualization with Processing.
- View Offline
This course has two major objectives. The first is to introduce you to Processing but the second objective is to introduce you to the field of Data Visualization, and show how you can use Processing to create customized visualizations from scratch. Now, in one sense visualization is nothing new. Data Visualization can be seen in some ways as nothing more than a more elaborate version of statistical graphing; like the pie charts and the line graphs that we're familiar with. On the other hand there is so much more to it than the simple and often simple minded pie charts and graphs that we see. It turns out that most kinds of statistical graphs have a long history and they all have one important thing in common. In the words of John Tukey, the spiritual father of the modern field to data visualization, "The greatest value of a picture is when it forces us to notice what we never expected to see." You can find this happening in one of the first known graphics, which is this chart of planetary movements from the 10th Century. It's pretty rough by modern standards, but even though it's over a thousand years old, it's pretty easy to tell that it's showing the positions of several celestial bodies over time, making it the earliest version of what can be called a Multiple Time Series Chart. It's also pretty clear that some of the bodies seem to move around a lot more than others, which can lead to some interesting theorizing about the nature of the solar system. Moving forward 800 or 900 years, two people made particularly important contributions to statistical graphics. The first was William Playfair, who was a Scottish engineer and economist. In this graph from 1786, Playfair created what is considered by many to be the first bar chart, which is still available right in Excel. In another graph of trade data, he also created a Time Series Line Chart, which was not terribly unlike the early planetary chart, but which was more clearly labeled and showed excellent use of color. Fifteen years later in 1801, Playfair came out with another innovation that's still with us today, for good or evil, the pie chart. The pie slices are easiest to see in the second pie to the left and the fifth one to the right. I should mention that despite Playfair's innovation, there are a lot of good reasons for not making pie charts and we'll talk about those in a later movie. While Playfair's graphs were significant for getting the visualization ball rolling, they existed primarily as an early form of annual report fodder, that this, they simply communicated information and tried to do so in a clear and attractive way. It was the English nurse Florence Nightingale who was the first to use statistical graphics as compelling tools for persuasion and policy change. Her best-known chart was the 1858 Diagram of the Causes of Mortality in the Army in the East, which depicted causes of death among soldiers in the Crimean war in Turkey. This particular chart which is a variation on Playfair's pie chart is called a Polar Area Diagram or sometimes in her honor, a Nightingale Rose diagram. Although Nightingale herself called it a coxcomb. As a result of her presentation, Queen Victoria appointed a sanitary commission that came to Turkey and removed dead animals from the water, got rid of rotten floors and improved ventilation. As a result, the mortality rate there dropped from 52% to 20%, making this perhaps the graph that saved more lives than any others. Playfair's and Nightingale's graphs also served to illustrate one half of a potentially important distinction in visualization, which is the difference between what is called Information Visualization and Data Visualization. Now this is far from a hard and fast distinction, but at least it can help focus thinking about graphics. Essentially, Information Visualization refers to graphics that are created to communicate information that is already understood by at least some people. This was the case for both, Playfair's and Nightingale graphs and it's true for most of the infographics, trotted out in today's newspaper. Data visualization on the other hand can be thought of as a graphics that are designed to help researchers find the patterns in the first place. One of the great historical examples of this kind of pattern searching comes from the 1854 cholera outbreak in the Soho district of London, which eventually, claimed over 600 lives. The predominant theory of cholera at the time was it was passed by "bad air." However, the physician John Snow charted each case of cholera and found an epicenter at the public water pump on Broad Street. This led to the removal of the pump handle, which may have contributed to the steep decline in cholera at the time, that is some say, it declined because they took the handle off, others say, it was already going down, but either way it was great detective work and an excellent low-tech solution to a serious problem. But really, Data Visualization is not identical to these graphs from 150-200 or 1100 years ago. Despite the fact that you can still produce most of these graphs in any spreadsheet program, a lot has changed since then, and the nature of visualization has changed significantly. One important change is the ability to automate analyses and graphics with computers, which is something that none of these pioneers had. But perhaps the most important change is the scale of data available and how the modern deluge calls for new and different methods. When a dataset has hundreds or thousands of variables and possibly millions or billions of cases, it's simply not possible to go through manually and do one variable, or one correlation or one pie chart at a time. For example, here are a few large data sets that I recently came across. The National Institutes of Health has a thousand genomes project and they have made the data freely available to the public, but it is 200TB of data. Also, CalTech astronomers have been measuring the brightness histories of 200 million stars and celestial objects, resulting in over 20 billion independent measurements. And then a few years ago, Google gathered language statistics from live webpages and the resulting dataset, while only 24GB, contained data on over one trillion words. And so, well, we won't deal with anything like those in this course. These examples do highlight the extraordinary demands and opportunities created by modern data sources; as well as, give you an idea of why methods developed for much smaller datasets may no longer be ideal. With those points in mind I want to show you some modern data visualizations that were created specifically in Processing. Now I'm doing this not to show you what we're going to accomplish in this particular course, but as a form of inspiration and to give an idea of what's possible with Processing. The first two I want to show you are by Aaron Koblin. This first was called Amsterdam SMS Messages and it's a way of examining text messages when and where they are sent. After that he created another one, that's very well known called Flight Patterns, which tracks the departure and arrival of flights in and out of the United States. And what's interesting is that even though he has provided no map of the United States, per se, you get a very clear feel for the outline, simply based by the departures and arrivals of flights. The next three that I want to show you are by one of the creators of Processing, Ben Fry. This first one is genetics data and what he's depicting is the genetic similarities between humans and other animals. The next one is a depiction of the changes in Charles Darwin's Book on the Origin Of Species. It's actually an interactive graphic that shows you it's a complete text and if you go to the website, you can click on it and see exactly what phrases and words were added or subtracted with each edition. The third one from Ben Fry is another, well-known one called Zipdecode and what it is, it's an interactive graphic that depicts the center point of every ZIP code in the United States, and all you have to do is begin typing numbers and it shows you where those zip code show up. So you see on the bottom left here, I typed 9, which highlights states on the West Coast, California, New Oregon and Washington. And this is a neat way of doing a quick interactive and also gives you an idea for population density. You can see for instance, that the East Coast is much more densely populated. It's practically solid around the New York area and that most of Nevada is very sparsely populated. The next one I want to show you is by Brendan Dawes and this is a section of a larger piece, this is called Cinema Redux. And what he's done here is he's taken a constant stream of still frames from several movies and simply arranged them in order, essentially re-creating the movie in very small images. Now if you go and see the entire collection, you can see very clear differences from one director to another. You see for instance here that Alfred Hitchcock had a much lighter palette and a much warmer palette than did William Friedkin in The French Connection, which was cooler and darker. You can also see that the vertigo was longer. This next one is by Casey Reas, the other major creator of Processing and is called Aura, it's a depiction of the same data set from several different perspectives and gives you an idea of how you can approach a single collection and get a diversity of representations out of it. After that we have two by Jer Thorp who is the Data Artist in Residence of the New York Times. This first one is based on Twitter and it's called Just Landed and it's simply collecting public tweets that included the expression, Just Landed and where they took off and where they ended up, and it connects them that way. After that Jer Thorp was commissioned to produce a graphic of every cover of popular science magazine over its history and he was able to do that both by grouping it by decade as well as by topic. And you can see the changes in topics for instance; phylogenetics and celluloid came up early on. Then, we had the V8 in full color in the 40s and 50s and then we have the microcomputer, the modem, which I see has dropped off in the recent years and the MP3, which has been one of the more recent topics. After that is the Max Planck Research Network, and this is created by Moritz Stefaner and Christopher Warnow, which simply shows the interconnections of a series of researchers. Next, Neil Banas has created something called the NPZvisualizer, which is for biological research. This is also an interactive one, if you go to the original website, you can adjust parameters and explore the effects of each one. This is the Feltron Report created each year by Nick Felton who keeps track of essentially every event that happens in his life and produces an annual personal report each year. He's also created software from this that you can use at his website feltron.com. The next one is called Similar Diversity by Philipp Steinweber and Andreas Koller which takes the text of several major religious books, identifies the major characters and shows where information about them appears and more significantly, the interconnections between those characters. This is actually a much larger graphic, sometimes shown on wall size, which makes it easier to explore the very small details upfront. The next several are by Reza Ali. The first two are simply library data about the Dewey Decimal Code System and the interrelationships between them. It's an interactive and beautiful graphic. The next three are depictions of the Lorentz force from physics. And I personally find them stunning images of natural science data. Next is Hamlet from Understanding Shakespeare by Stephen Thiel. And what this shows is a selection of keywords from Hamlet arranged by scene with the yellow highlighting the major characters. In this case, it's Hamlet where Hamlet speaks all the way through. You can see for instance that he doesn't speak very much in the second scene, but he comes back to the third and so on. And these also are produced wall size and wonderful ways for getting the general feel of what's going on in a particular story. This one by Victor Vina and Nerea Calvillo is in the air, which depicts air pollution in Spain. And then finally, SNCF tag cloud by Xiaoji Chen shows tweets about the national rail system in France and the major topics we see, for instance fail and late apparently come up a lot. And so, this collection I present to give you an idea of what can be accomplished in Processing and the room for creativity and personal interpretation as well as an insight into data that you can get by using this program.
- Exploring the need for creative data visualization
- Drawing basic lines and shapes
- Introducing variables, strings, and arrays
- Modifying drawing attributes such as color
- Making drawings more dynamic with animation loops and spirals
- Creating keyboard- and mouse-based interactions
- Adding images, video, and sound
- Reading in text or XML data
- Creating plots and charts
- Publishing and sharing your work