Learn the full breadth of the team it takes to do big data right. In short, data scientists are focused on delivering answers to the business, not architecting or implementing large-scale data platforms.
- [Instructor] When people think about big data, they often think that that's a thing for data scientists. That data scientists can take care of it. I'm here to tell you that that's a myth. Big data can not just be handled by your data scientist. So let's take a look though, at what a modern data scientist really is, and what their skill set's like. First and foremost, data scientists are known for their ability to perform statistical analysis. This also usually goes well with a fondness for numbers and a good grasp on math. In this category, we have really exciting things like machine learning and experimental design, bayesian inference, all things that are outside of big data.
They're just ways of understanding the world around us that don't necessarily directly correlate with big data systems, even though there's obviously data involved. Next, programming and databases. This is an area that is core to a data scientists role, however, they're not typically interested in building large scale applications, rather, they want to use these skills to perform their task at hand and be done with it. We do things like, script stuff in Python, we use statistical computing packages like R, we write SQL to actually extract data, but none of this really leads towards what kind of knowledge is required to stand up the infrastructure, how to design the architecture, how to connect and network these things together.
Big data is an entire ecosystem of products. Data scientists are some of the key people that use it, but actually, standing it up really isn't in their wheelhouse. Next, a data scientist can really benefit from having strong domain knowledge, and some soft skills are important too. This is one of the big challenges when it comes to finding a good data scientist, is finding someone that knows your industry and can navigate the nuanced relationships of the corporate world. Now lastly, a good data scientist is somebody that can communicate an idea, speak clearly, and make impactful data visualizations.
Notice, nowhere on this list did I mention system design, architecture, data streaming or batch processing, or any of the innumerable tasks that are required to do big data right. Today's big data challenges require at least three major skill sets, at least in my view. I like to break them down into these three categories. Data engineering, data science, and data analytics. The data engineering team is the one that actually lives more on the IT side, and is generally responsible for the architecture, implementation, and maintenance of the big data platform.
Your analytics team is focused on the day-to-day decision making of the company, and injecting data into those processes to help everyone get smarter. The data analyst role is really at the front line of making the most of your big data platform at a systemic scale. Lastly are your data scientists, who we've already talked about as having a more strategic role, focusing more on the complex problem, and finding very specific answers to those problems. So the notion that big data can be handled entirely by your data science team, is a myth.