There has been some research showing the benefit of using Kanban for data science projects, and in fact, some data science teams do use Kanban to help streamline their data science project. However, most do so without specific training and certification. With this in mind, below, we provide a quick overview of Kanban as well as the strengths and weaknesses of using Kanban for data science projets.
Kanban defines a set of key principles: Visualize the workflow, limit work-in-progress, measure and manage flow, make process policies explicit, and improve collaboratively / implement feedback loops.
When using Kanban, the data science team starts with a list of potential features or tasks, similar to the backlog concept of scrum, that are placed in the initial “To Do” column of a Kanban board, which is a visual representation of workflow. In a simple three column board (see below), when the team decides to start working on the task, the Kanban card (task) is moved from the “To Do” to the “Doing” column. When the team completes its task, it is moved to the “Done” column. Kanban boards often have additional columns. For example, teams often add a validation column before the “Done” column.
Uncompleted work, or work in progress (often referred to as WIP), is thought of as an investment whose value has yet to be realized – and will not be realized until the work is completed. To help reduce WIP, Kanban teams set WIP Limits that define the maximum number of tasks that can simultaneously exist in a given column. For example, if a team places a WIP limit of three in the Doing column, then they are only allowed to be simultaneously working on three items at once. If the team wants to bring in a fourth item from To Do into Doing, the team would first have to complete one of the tasks, move the task from Doing to Done, which free’s up space for the new item.
Strengths of Kanban
Two key strengths of Kanban are that (1) it visually represents work on a Kanban board with work items flowing across the columns (or bins) of increasing work status completion (i.e., work items are represented visually on a Kanban board, allowing all of the data science team members to see the state of every piece of work at any time), and (2) it aims to minimize work-in-progress.
The concept minimizing WIP enables agility since new knowledge is gained prior to the start of more work. In other words, possible future tasks are re-prioritized each time a new task starts, and by minimizing how many new tasks are in progress, a data science project can re-prioritize as needed without significant wasted effort.
Kanban proponents claim that Kanban offers improved project visibility, team communication and coordination. In fact, Kanban has been shown to be effective for data science. Its fluid and less rigorous processes provide data scientists with greater flexibility to execute their work without having to hit constant deadlines.
Challenges in using Kanban
Despite the benefits of using Kanban, there are also challenges to using Kanban for data science efforts. One key challenges is that Kanban has no specified process framework nor any specifically defined roles, meetings. Furthermore, time boxes are not defined in Kanban. Rather, each data science team is free to use any process framework that supports / encourages the Kanban principles (minimizing WIP, etc). While this is a positive for data science projects (in that an analysis is often difficult to scope), it is also a challenge, in that some managers want “specific deadlines”. In fact, Scrum teams often use Kanban as a secondary approach to manage workflow during the sprint, although often without WIP limits.
Since, Kanban does not define project roles nor any process specifics, the freedom Kanban provides can be part of the challenge in implementing Kanban. In other words, the lack of process structure can be a strength (since the lack of a specified process definition allows teams to implement Kanban within existing organizational practices), as well as a weakness, as each data science team that wants to use Kanban needs to figure out their own processes and artifacts. This means that the lack of process definition can cause each data science team to implement Kanban in a different way. Hence, it is not surprising that there remains a lack of consensus around how to use Kanban.
The fact that Kanban does not explicitly specify a process framework also suggests that Kanban needs to be supported by additional practices and/or frameworks. For example, a data science team could use Kanban with CRISP-DM or TDSP. This lack of process definition also explains why teams that use Kanban have noted that “Kanban requires integration with existing agile techniques, which can be complicated, expensive, and time-consuming”.