Join Barton Poulson for an in-depth discussion in this video Big data, part of Data Science Foundations: Fundamentals.
- [Voiceover] One of the important things about data science is being able to distinguish it from other related fields. The first one I want to compare it to is big data, because a lot of people use the terms somewhat interchangeably. Now, the confusion makes sense if we do this. Let's begin by looking at the data science Venn Diagram. Data science is, by one definition, composed of three separate domains, or skills. There's hacking skills, that's computer programming and coding, there's math and statistics, and there's substantive expertise.
And, take those together, and you get data science. Now, you may also be familiar with the big data Venn Diagram. And, in effect, I can take the exact same three circles and we can create big data where it's a combination of what are called the three Vs, where there's a large volume of data, or the data comes at a high velocity, or there's overwhelming variety to the data. And, when you have those three Vs together, you get big data. And, so, to help distinguish these two, I'm going to create my own little Venn Diagram with just two circles.
I call it the big data and data science Venn Diagram, where we have big data on the left, data science on the right, and, in the middle, we have something called big data science. The point of this is that the two fields do in fact overlap. However, they are conceptually distinct. And, so we'll take a closer look at the connections and the separations between the two. First, let's look at big data without data science. A good example of this is machine learning and word counts.
These relatively conceptually simple tasks, they do get complicated in their implementation, but you don't, for instance, necessarily have to have a lot of mathematics or statistics to do word counts. Now, on the other hand, you need at least two of the aspects at once. You can't do data science if you don't have two of the skills in the data science Venn Diagram. Next, we have data science without big data. Well, for instance, take a look at genetics data. These are huge data sets, but they tend to be static, and they have a consistent format.
So, depending on who you talk to, they may or may not count as big data. Or, streaming sensor data, where, again, it's very structured, but if it's streaming, maybe you're not even keeping large amounts of it. Or, facial recognition in photographs. There's an enormous amount of variety there, but it might be a relatively small number of photographs. The third combination is here in the middle. It's called big data science. And, this is when big data has all three of the Vs, the volume, the velocity, and the variety.
In this case, you have to have full data-science skill set. You have to have the coding, the statistics, and the domain expertise in order to make it work. So, the conclusions that we get from this are three. First, big data and data science differ. They're not the same thing, even though people often use them interchangeably. However, they share some goals, and they share some techniques. And, there's a lot of people who are able to do the both of them. Finally, big data science, as a distinct field, combines the challenges and the skills of the two domains.
- Assess the skills required for a career in data science.
- Evaluate different sources of data, including metrics and APIs.
- Explore data through graphs and statistics.
- Discover how data scientists use programming languages such as R, Python, and SQL.
- Assess the role of mathematics, such as algebra, in data science.
- Assess the role of applied statistics, such as confidence intervals, in data science.
- Assess the role of machine learning, such as artificial neural networks, in data science.
- Define the components of effective data visualization.