From the course: Ethics and Law in Data Analytics

The age of big data

- It may be tempting to think of data as alpha-numeric digital expressions of some kind, but that's only one kind of data. Data are simply recorded facts, and they are neutral with regard to what counts as a fact and how those facts are recorded. One implication of this idea is that data have existed from the moment our ancient ancestors had the ability to record facts and found it useful to do so. Some very early examples of data creation could be a symbol etched on a cave wall, or scratchings on a tree indicating the number of necessary foot paces to a burial site. For most of this age, data recording would have been done by hand or by some related physical means. In the middle of the 19th century, a new kind of data became possible when scientists discovered different ways to record sound using analog technology. Recorded images have an even more complicated history with even ancient societies using principles of light to record images. For our purposes we might put these and other technologies together and call them the pre-digital age of data. In the last couple of decades, the term big data has been increasing in popularity. There are some who don't like the term because it's not technical. They might note that big is a relative concept, and so what counts as big today will not be big tomorrow. There are also attempts to define the term big data along the lines of the features and benefits of data. For example, you have probably heard of the so-called five Vs of big data. But we don't need to get too technical to understand the most fundamental difference of this new age of data. It is right to call data big simply because there is so much of it. One estimate puts the amount of all existing data from the beginning of recorded history until 2003 at five billion gigabytes. That might sound like a big number, except the same study estimates that just a few years later by 2011, the amount of data generated every two days was five billion gigabytes. Probably the best place to start the story of big data is the late 1940s, when the transistor was invented, which for reasons far too technical to describe here allowed us for the first time to record facts digitally. That is, as encoded in ones and zeroes. This is the digital age. This is important in explaining this massive increase in data, because non-digital data storage is remarkably inefficient. Yes, you can create data by writing down someone's name on a piece of paper, but that requires a lot of space and does not record a lot of data. Digital storage partly explains why data got so big. The other part of the equation has to do with who or what is recording the data. When a human being enters data into a database, there is a natural barrier to how much data can be recorded simply because of the biological limits of human recording abilities. But now, data about you and me is recorded automatically by machines. To take an easy example, your Smartphone records your location every second of the day. All the websites you visit, songs you listen to, texts you send, et cetera, et cetera. This means that you alone, even on your most unremarkable day, create a mind numbing amount of data. Now all this data is recorded in databases, but not by a human. No person is furiously updating your location in a database each time it changes. Rather, a machine records it all automatically. So what is important is that we have crossed two thresholds, the first, in the ability to store data in ones and zeroes, and the second in the advent of machines automatically recording that data. Crossing the first threshold took us into the digital age, and crossing the second threshold took us fully into the age of big data.

Contents