In its simplest possible definition, big data is data that's just too big to work on your computer. Obviously this is a relative definition. What's big for one system at one time is common place for another system at another time. That's the general point of Moore's Law, a well known observation in computer science that physical capacity and performance of computers double about every two years. So for example, my Mac Classic two, which got me through graduate school, had two megabytes of ram and an 80 megabyte hard drive and so as far as it was concerned, big data is something that would fit onto a one dollar flash drive right now.
On the other hand, in Excel the maximum number of rows that you could have in a single spreadsheet has changed over time. Previously it was 65,000. Now it's over a million, which seems like a lot, but if you're logging internet activity where something can occur hundreds or thousands of times per second, you'll reach your million rows very, very quickly. On the other hand, if you're looking at photos or video and you need to have all of the information in memory at once, you have an entirely different issue.
Even my iPhone takes photos at two or three megabytes per photo and video at about 18 megabytes per minute, or one gigabyte per hour. That's on my iPhone. And if you have a Red Epic video camera you could do up to 18 gigabytes per minute. And instantly you have very big data. Now, some people call this lots of data, meaning it's the same idea of the data that we're generally used to, there's just a lot more of it. And that gets into the issues of velocity and variety.
We'll talk about velocity next.
- Evaluate the demand for data science in business, research, and consumer technology.
- Assess the careers and skills in data science.
- Review the ethical issues in data science.
- Explore data visualization with graphing tools.
- Discover how data scientists use tools such as Hadoop and Excel.