Big data is characterized by a high volume of data, the speed at which it arrives, or its great variety, all of which pose significant challenges for gathering, processing, and storing data.
- [Narrator] The one thing we know absolutely for sure about big data is that it's really big. There's a lot of data, it's big. But you know, what counts as big data changes with the times. Once upon a time, a bunch of punch cards might've been big data or back in 1969, this was a massive amount of programming. This is Margaret Hamilton with the code for the Apollo Guidance Computer for which she eventually won the Presidential Medal of Freedom. And then what might be big at one time becomes normal or even small at another. So for example, back in 1992, I got my first computer, an Apple Macintosh Classic II with the optional larger 80 megabyte hard drive. I got all the way through grad school. I wrote my PhD thesis on that computer and now I have a relatively modest MacBook Pro and when I home it's connected to ten terabytes of external storage. That's 125,000 times as much storage as my first computer and truthfully, I've bumped up against the limits. So it's massive by comparison to my 1992 self but it's middling for consumers and really puny by any commercial standards. And so, if we want to think about really, what do we mean by big data given this relative shifting frame of reference, well, one thing that's pretty consistent is the definition that relies on what are called the three Vs of big data. And so, the first V, the first characteristic of big data, is volume, simply meaning there's a lot of it. And the second is velocity, having to do with the speed with which the data arrives. And the third one is the variety or the nature and the format of the data. Take those together and you have what most people would consider big data and I want to talk about each of these a little bit separately. So volume is the most obvious one. This is when you have more data than fits in a computer's RAM, its memory, or maybe you have more data than fits on a single hard drive and you have to use servers and distributed storage. I mean think, for example, about the data on Facebook's 2+ billion users. You can't put that on a single computer. Or the information on Amazon's 120+ million items that they sell online. Keeping track of all that is obviously going to overwhelm any one computer, or even a collection of computers. Next is velocity. You know, a gentle breeze through the trees is nice but a hurricane is a whole other situation. The velocity refers to data that comes in rapidly and changes frequently. So think about, for instance, it's been estimated that nearly 200 million emails are sent each minute of each day. Or that five billion videos are watched on YouTube every day. If you're trying to keep track of this stuff as it happens, it's going to be a completely overwhelming job. And then the third V is variety. Data comes in a lot of different formats and you can have video, photos, audio, you can have GPS coordinates with time and location, you can have social network connections between people, and all of these represent distinct kinds of data from the regular rows and columns of numbers and letters that you would expect to find in, like, a spreadsheet. And all of these require special ways of storing, managing, manipulating, and analyzing the data. Taken together, those usually constitute big data. Another way to think about it is big data is data that is hard to manage. It's the idea like some animals are a little more challenging to deal with and have not been domesticated, like the zebra. Big data's a little like the zebra of the data world. It's simply not easy to work with, not through conventional standards, and you're going to have to be very adaptable to get the value out of that data. Now, I do want to say something historical about the term. This is Google Trends data on the search popularity for the term "big data" on Google, and we have data from 2011 through 2019. And what you can see is there's an obvious peak right there in the middle at October of 2014. That is when the Google searches for "big data" were the most common overall. Now, this doesn't mean that people don't care about big data anymore. You see how it's gone down maybe a third, maybe even 50% since then. Well, there's a saying that a fish, or in this case, a seahorse, would be the last to discover water. That's because it's everywhere around them. It's literally the medium in which they live and move. The same thing is true for big data. While the searches for big data may have declined a little over the few years, not because nobody cares about big data anymore, it is because big data has become the air that we now breathe, the water that we move in. Big data has become the new normal data for use in data science and machine learning and artificial intelligence, and because of that, understanding what big data means, the special challenges that it creates, and how to work with it is as relevant now as it ever has been.
- Identify the components that make up big data.
- Examine how big data has grown over the last few years.
- Explain the importance of using big data in business organizations.
- Distinguish between knowledge requirements for using big data and for understanding data science.
- Justify the need for training on big data within an organization.
- Analyze the factors that go into utilizing big data on a project.
- Differentiate outcomes that are derived from big data from outcomes that are derived from observing behaviors.