Join Doug Rose for an in-depth discussion in this video See how to use a DSLC, part of Learning Data Science: Using Agile Methodology.
- Data science doesn't work very well with existing process life cycles. It's not enough like software to fit into the SDLC. The CRISP-DM data mining process is a little too rigid for quick results. That doesn't mean that a data science team should work in whatever way feels right. There's some real value in these life cycles. One value is that they give you a high level map of where you're going. This is useful when you're just starting a data science team. You can get a clear sense of the path forward. That way, you'll start with the end in mind.
The danger in these life cycles is that it becomes a primary focus of your work. You want to use the life cycles as a way to do better data science. You don't want to follow the process for the sake of following the process. A good life cycle should be like a hand rail. It's good to know it's there, but you don't want to cling to it with every step. After a while, you shouldn't even notice that it's there. For data science projects, you can use a data science life cycle, or DSLC.
This framework is lightweight and less rigid. The DSLC has six steps. Identify. Question. Research. Results. Insights. And Learn. This life cycle is loosely based on the scientific method. As a data science team, you want to start out by identifying your key roles. In the end, you'll want to be able to tell an interesting story. What better way to start your story than by identifying the key roles.
It might be easier to think about this step like a scene in a play. Who walks in to the room? Is there a main character? Maybe there's a backstory that helps make sense of their actions. Think about our running shoe website. You might want to start by identifying the key players. There is the runner. Maybe the runner has a partner. They could have a doctor, blogger, or trainer. Each of these players might be a part of your data science story. Once you've identified your key roles, then you might want to ask some interesting questions.
Your team's research lead might start out by asking, is there a blogger that influences your runners? Maybe their trainer has a big role in influencing what runners purchase. They might ask, are crossfit trainers recommending our products? These questions will be the first steps in exploring your data. Remember that data science is experimental and exploratory. When you start with a good set of questions, you're more likely to get interesting results. The best way to come up with new questions is by identifying key players.
The data analyst wants to work closely with the team to get some simple strategies for researching the questions. The team decides to explore the relationship between the runners and their partners. Here, the research lead asks the data analyst how the team should gather this information. How can you determine if someone's a running partner by the data that they have on the website? Maybe they could send a find-a-friend promotion to the same zip code. The data analyst could try to cross reference the customer data with people who are friends on social network sites.
If they can't research this question, then the team can come up with strategies for the future. Maybe the website could create a special promotion just for running partners. After you have your research topic, you'll want to create your first reports. These results are for the team, they should be quick and dirty. Hopefully, your data science team will go through a lot of interesting questions. Many will be duds. They'll be interesting, but not interesting enough to explore further. You won't want your data analyst spending too much time perfecting the results.
Finally, your data science team wants to look at the results and see if there are any insights. Maybe the data suggests that most of your customers run with partners. That insight might be very valuable to the marketing team. After that, your team can bundle these insights and create organizational knowledge. It's here that your team will tell the customer's story. You might use data visualizations to back up your insights. This new knowledge is what really adds value to the rest of the organization. If you tell a compelling story, then it might change the way your organization views their customer.
This course shows how to structure your work within a two-week sprint. See how to work within a data science life cycle (DSLC)—a methodology for cycling through questions, research, and reporting every two weeks. Explore key practices to help your team break down the work so it fits within a two-week sprint. Learn how to use tools like question boards to encourage discussion and find essential questions. And most importantly, learn how to grow your team's shared knowledge and avoid common pitfalls.
- Defining data science success
- Determining project challenges and criteria for success
- Using a DSLC
- Iterating through DSLC sprints
- Creating a question board
- Breaking down your work
- Adding to organizational knowledge
- Avoiding pitfalls