Some data science teams explore using the Scrum framework for data science projects. This is particularly true if the data science team lead has training and is certified in using Scrum (which is often the case if the person has a software development background – as Scrum is used extensively for software projects). In other words, it’s not uncommon for data science leaders with a software background to try to leverage / adapt Scrum in a data science context.
Furthermore, data science teams that need to coordinate with other Scrum teams within their organization often try to use Scrum to manage their process. Haydar Özler’s Post nicely outlines an approach to using Scrum in a data science context and addresses some of the common challenges and concerns.
In short, Scrum is a general adaptive framework for developing, delivering, and sustaining complex products by dividing a larger project into a series of mini-projects, called “sprints”, each of a consistent and fixed length, typically one to four weeks long. Scrum teams have three explicit roles: the product owner, the development team, and the scrum master.
Each sprint, starts with a sprint planning meeting where the product owner explains the top items from the product backlog, which is an ordered list of product development ideas. The development team forecasts what items from the product backlog they can deliver by the end of the sprint and then makes a sprint plan to develop a product increment that includes the selected product backlog items. During a sprint, the team coordinates closely and holds daily standup meetings. At the end of each sprint, the team demonstrates the newly developed product increment to stakeholders and solicits feedback during their sprint review. This increment should be potentially releasable and meet the predefined definition of done. To close a sprint, the team inspects itself and plans for how it can improve in the next sprint during the sprint retrospective. Throughout the process, the scrum master acts as a servant leader and coach to help everyone effectively implement Scrum.
Many Teams Succeed Using Scrum for Data Science
Using Scrum has the advantage that in many organizations, the software teams have been trained and certified to use Scrum. So, if the data science team uses Scrum, the processes and systems for interfacing with Scrum teams are well understood by the rest of the organization. However, Scrum is not as prevalent in Data Science as it is in software development, largely due to specific challenges that some teams face when adopting Scrum for data science.
Challenges When Using Scrum for Data Science
Some data science teams struggle to use Scrum. In fact, many data scientists have written about their challenges in using Scrum (such as Changhsinlee’s blog post).
One key challenge of using a sprint-based framework within a Data Science context is the fact that task estimation is unreliable. In other words, if the team can not accurately estimate task duration (ex. how long with a specific exploratory analysis take), the concept of a sprint, and what can get done within a sprint is problematic. The challenge of task estimation is driven in part by the fact that many tasks are exploratory in nature. This was noted by Changhsinlee, and this challenge has also supported by research on exploring different process frameworks used by data science teams.
Another key challenge are that Scrum’s fixed-length sprints can be problematic in that even if a team could estimate how long a specific analysis might take, having a fixed length sprint might force the team to define an iteration to include unrelated work items (as well as delay the feedback from an exploratory analysis), which could help prioritize new work. In short, a sprint does not allow smaller (or longer) logical chunks of work to be completed and analyzed in a coherent fashion.
For more info on Scrum, check out the Scrum Guide