Join Doug Rose for an in-depth discussion in this video Use a DSLC, part of Learning Data Science: Using Agile Methodology.
- You've seen how difficult it is to get data science teams working within a project management framework. Traditional projects rely on scope, cost, and schedule. Data science is about experiments and exploration. It's a challenge to tie down a defined scope. It's also a problem to use traditional project management language. Teams won't usually be able to find clear success criteria. It's not like a typical project. You can't plan the work and then work the plan. You won't necessarily have a set of clear objectives.
Most of your team's experiments will just be dead ends. Just because traditional project management doesn't fit your team it doesn't mean that you have to work in chaos. The absence of a plan doesn't mean the absence of intent or value. Remember that data science increases organizational knowledge. New insights can decrease the time-to-market, create new revenue, avoid costs, and even build good will. You might not know how you can get there but that doesn't mean that you shouldn't start working.
When you want to work differently, you have to use a different life cycle. A life cycle is a series of steps you take when you're approaching a challenge. There are two common types that you'll likely run into in large organizations. The first is the software development life cycle, or SDLC. This life cycle has six phases, plan, analyze, design, code, test, and deploy.
It's typically called the waterfall model. That's because each of the phases has to complete for the next begins. You plan the software and analyze the requirements. Then you create the basic designs and start your coding. Once the code is complete, you'll have your quality assurance people step in and test your software. Once it passes all the tests, then it's deployed for people to use. The second life cycle you'll likely see is the cross industry standard process for data mining, or CRISP-DM.
This process model is used for data instead of software. It's a little bit more flexible than the rigid waterfall model. It also has six phases. Business understanding, data understanding, data preparation, modeling, evaluation, and deployment. What both of these life cycles have in common is that they're designed for big bang delivery. You spend a big chunk of your time either in the planning and analyze phase or in the business understanding phase.
The goal is to gather up as much information as you can before you get started, then you deliver it all in one big bang. Not much is communicated in terms of final deliverables before the final deployment phase. That's not necessarily the best approach when you're working in data science. Remember that data science is experimental and exploratory. Your team will use an empirical approach to try and understand the data. The idea is that you don't fully understand what you need until you start working.
Imagine a typical data science project. Let's say that your data science team is identifying typical customer behavior before they decide to leave you for a competitor. Sometimes this is called the customer's churn rate. Your data science team might be able to clearly state their intent, they want to see what a customer does before they leave, then they want to create a model to predict when someone might leave. What they won't be able to do is plan out the details of their work. Maybe they'll find their best models by looking through social network data.
They could get their best insights from their own sales data. There could be a promotion that your competitor offers that becomes unusually successful. The point is they won't know until they start looking. Your team will spend too much of their time planning if they're forced to use the SDLC or the CRISP-DM process. They won't be able to apply what they've learned while still working. That's because they're forced to plan out their work before they even begin modeling or coding. Remember that a defined process like the SDLC or the CRISP-DM requires that most of the work is understood.
If you make a mistake, you have to deal with change requests in the SDLC or a reevaluation in CRISP-DM. If you want your data science team to be flexible and exploratory, then you can't apply these life cycles. Instead, you should look for a more lightweight approach to delivering insights, that way you'll have structure but at the same time have the flexibility to adapt to new ideas.
This course shows how to structure your work within a two-week sprint. See how to work within a data science life cycle (DSLC)—a methodology for cycling through questions, research, and reporting every two weeks. Explore key practices to help your team break down the work so it fits within a two-week sprint. Learn how to use tools like question boards to encourage discussion and find essential questions. And most importantly, learn how to grow your team's shared knowledge and avoid common pitfalls.
- Defining data science success
- Determining project challenges and criteria for success
- Using a DSLC
- Iterating through DSLC sprints
- Creating a question board
- Breaking down your work
- Adding to organizational knowledge
- Avoiding pitfalls