Join Barton Poulson for an in-depth discussion in this video Metrics, part of Data Science Foundations: Fundamentals.
- [Voiceover] When you're getting ready to conduct a data science project, one of the most important things is to choose a metric or a measure of success. That's because data science is goal-directed. You need to know what your goal is in order to tell when you've gotten there or to know whether you have a viable answer. Specifically, data science is action-oriented, where the goal is to do something. That's in contrast to, say, academic research, which is knowledge-oriented, and the goal is understanding. And I say this as a practitioner.
Also, data science, the goals must be explicit so they can guide your effort, know what you're shooting for, know if you're on your way to reaching it, know if you need to make corrections. And having a clear goal benefits the client. It prevents frustration, 'cause they know what to expect from you. And then, finally, it benefits the analyst, that's you, because it gives you an efficient use of your time. Now, the idea here is you want to be goal-directed, but, at the same time, do be open to serendipity. There are possibilities that may arise, but knowing what you're trying to accomplish is always an important first step.
Now, let's talk about specific metrics, or ways of measuring. I'm gonna review a few possible ways of doing this. Business metrics. Key performance indicators or KPIs. SMART goals, where that's an acronym. And classification accuracy. We can take a look at each one. Next is key performance indicators, or KPIs. This comes from David Parmenter. The idea here is that key performance indicators generally need to be nonfinancial, not just costs or revenues.
Those are ends, not particularly goals. They need to be timely, so you get the information weekly, daily or constantly. A CEO focus. Those are the people who act on the KPIs. They need to be simple, so that everybody in the organization understands these indicators. Team-based, so people can take joint responsibility for meeting these indicators. A significant impact. That means that the indicator should affect more than one important outcome, like profitability and market reach, or improved manufacturing time and fewer defects.
Next are SMART goals. SMART stands for specific, measurable, assignable, you need to know who it goes to, realistic, needs to be achievable given the available resources, and time-bound. You need to say when the goal should be achieved. That actually is an important part of conducting a data science project, that you have specific limits for times, or time boxes that you can use. And then from the research world, you can have classification tables, which look at whether a test for something is positive or negative, and whether the event is actually there or not.
And you get these combinations of true positives, false positives, and so on. Now you can take this information and from that you can calculate several summary statistics that are frequently used for assessing classification. They include things like sensitivity, that is, if the event is actually present, how likely are you to actually get a measured success? Or specificity means trying to avoid false negatives. The positive predictive value and the negative predictive value. In conclusion, we can say three things.
First off, measurement boosts awareness. When you measure an outcome, then you're in a situation to actually do something about it. Next, awareness contributes to quality in the product or the service or whatever it is you're providing. And, finally, measure thoughtfully and measure sensitively.
- The demand for data science
- Roles and careers
- Ethical issues in data science
- Sourcing data
- Exploring data through graphs and statistics
- Programming with R, Python, and SQL
- Data science in math and statistics
- Data science and machine learning
- Communicating with data