1) Analysis and insight is not the same skill. 2) Good questions can come from the business.
- Right now, a data scientist is just a group of qualities, and not really a consistent skillset. The term data scientist was first used to change how statisticians think. In 2001, William S. Cleveland wrote Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics. This paper was the first attempt to merge the fields of statistics and computer science to create a new field. At the same time, Leo Breiman published Statistical Modeling: The Two Cultures.
This paper described how statisticians could change their mindset and embrace a diverse set of tools. Seven years later, some top data engineers from Facebook and LinkedIn got together to describe their day-to-day challenges. They saw their role as a crossover of many different disciplines. They decided to use the term data scientist to describe this role. A data scientist at this time was just a list of qualities. This person needed to understand data, statistics, and math, apply machine learning, have knowledge of programming.
They needed to be curious and be a great communicator and hacker. They were programmers who crossed into many different fields. The problem is that this list of skills is not easily found in any one person. People often gravitate towards their talents. Then they try to refine their craft. A statistician will usually work to become a better statistician. A business analyst will want to be a better communicator. There's also a lot of organizational pressure to specialize. Many large organizations are divided by functional areas.
They want people to focus on being the best in their area. Another issue is that people are often pretty bad at self-assessment. In the famous Dunning-Kruger Study, researchers found that people who rated themselves as very skilled often overestimated their ability. A gifted statistician might think they're a great communicator, but that's not always true. That's why most organizations divide their work into teams. Each individual on the team will have their own areas of expertise.
A cross-functional team doesn't assume that everyone is an expert. Instead, it encourages individuals to balance out their strengths, and cover each other's weaknesses. A team that only has data scientists might not identify those weaknesses. The team will often fumble and not see their own blind spots. I once worked for an organization that had a team of data scientists building out a cluster. There was some criticism from the business that they had no idea what the team was doing.
The business was frustrated. They didn't like paying for something that they didn't understand. I went to a meeting where the team of data scientists, demonstrated a simple MapReduce job. The business manager stared blankly at the job, and occasionally glanced at their smartphones. To an outsider, it seemed obvious that there was a communication breakdown. After the meeting, I wrote a list on the whiteboard. I listed out several skills. I labeled each one. The first was Data, then Development, then Machine learning, Statistics, Math, and Communication.
I asked the team to rate their own strengths and weaknesses. What they came up with was lower marks for Data and Development, but they gave themselves a nine for Communication. One point away from perfect. I took the results to the business analysts. I'd ask them how they would rate the team. They came up with perfect scores for data and development, but only a six for communication. It was a classic Dunning-Kruger result. In the places where they rated themselves the highest, they dramatically overestimated their expertise.
The data scientists came from quantitative fields. They were statisticians, mathematicians, and data analysts. They would often miss each other's blind spots. It took someone from outside the field to shine a light on their challenges. If you're a large organization, it would be a mistake to rely on teams of data scientists. Instead, try to create a varied team of people with different skillsets. Don't assume that key people with a quantitative background will have the best questions and insights.
Keep your team varied, and you're more likely to have great results.
Learn the holistic approach to building teams and deploying data science across disciplines. Identify the key roles and responsibilities, including research lead, data analyst, and project manager. Find out how to define areas of responsibility, foster effective communication, and build compelling reports and visualizations. Then see how to avoid the pitfalls of losing focus and arriving at false consensus. These techniques help you build highly skilled teams that produce deeper insights than you'll find from relying on data scientists alone.
- Creating a data-driven culture
- Defining team roles and areas of responsibility
- Finding wisdom in groups
- Presenting beautiful reports
- Thinking like a team
- Avoiding pitfalls