Team Data Science Process -TDSP
Microsoft’s Team Data Science Process (TDSP) is an iterative data science framework that was initially release in 2016 (the framework was most recently updated in early 2020). As a relatively new framework, there are few people trained and certified in how to use TDSP.
TDSP defines a high-level data science project life cycle as well as a standardized data science project structure (e.g., team roles). While there are parts of the framework that leverage Microsoft tools and infrastructure, the rest of this discussion will focus on the more generic aspects of the framework, that are not tied to Microsoft’s suite of products.
TDSP’s process lifecycle framework
This aspect of the framework focuses on “what to do” (not “how to do it”), by defining five stages for a project (which is sometimes known as project phases). The stages within TDSP’s lifecycle are similar to CRISP-DM’s phases, and include Business Understanding, Data Acquisition and Understanding, Modeling, Deployment, and Customer Acceptance.
The TDSP lifecycle is modeled as a sequence of iterated steps that provide guidance on the tasks needed to create and use predictive models. Note that, similar to CRISP-DM, projects are expected to “loop back” (in other words, execute these phrases iteratively). However, similar to CRISP-DM, the framework does not define when the team should iterate. For example, the team can iterate and the next of each complete lifecycle, or between phases within the lifecycle. Note that teams using TDSP are free to pick another lifecycle framework (such as directly using CRISP-DM or an organization’s custom set of phases).
TDSP’s team roles
While TDSP’s framework is similar to CRISP-DMs, TDSP does address CRISP-DM’s lack of team definition by defining four distinct roles (solution architect, project manager, data scientist, and project lead) and their responsibilities during each phase of the project lifecycle
For the team to complete the project, these stage-specific tasks and artifacts are associated with specific project roles. In fact, TDSP defines four specific roles (solution architect, project manager, data scientist, project lead). Note that the many aspects of data science, such as data engineering, are merged within the data scientist role).
Standardized Resources (project structure / tracking)
Independent of the actual lifecycle framework used, for each stage, TDSP provides goals, artifacts and guidance on how to complete the artifacts. Demonstrating the focus on project document artifacts, TDSP suggests that all project documents use standard templates and that the documents (as well as project code) is stored in a version control system (such as Git). In fact, key concept within TDSP is the focus on communicating tasks across the team and stakeholders / customers by using a well-defined set of artifacts that employ standardized templates. With respect to task/feature tracking and prioritization, TDSP suggests to use one of the many commonly available tracking systems (such as Jira, Rally). Furthermore, TDSP suggests using these tools to provide cost estimates as part of the project process.
Agile & TDSP
Microsoft provides a description of how to integrate the concepts of scrum within TDSP. Basically, one can define a backlog with work items, and then use that backlog to do sprint planning and sprint execution. In the TDSP sprint planning framework, there are four typical work item types (Features, User Stories, Tasks, and Bugs) and there is one backlog for all the work items, which are tracked / managed at the project level. Just as with other scrum projects, these work items can be managed via the traditional scrum processes (such as sprint planning meetings). However, while Microsoft describes TDSP as supporting an agile approach, there are also waterfall like aspects to the framework. For example, at the end of each stage, there are specific artifacts that need to be created, including:
- Business Understanding: Project Charter (project manager)
- Data Acquisition& Understanding: Data Summary Report (data scientist), Solution Architecture Diagram (Solution Architect)
- Modeling: Model Report (Data Scientist)
- Deployment: Dashboard (Data scientist)
- Acceptance: Project Final Report (Project Manager)
Teams are free to use TDSP Stages, CRISP-DM phases or any other project lifecycle they deem appropriate. Furthermore, when using TDSP, the team is free to use scrum sprints or more traditional project deadlines. In addition, when trying to use an agile version of TDSP, it is up to the team on how to think about work tasks – either via the feature/user stories work items, or via the TDSP lifecycle stages. In other words, if a team uses scrum within TDSP, they will use sprints but think of tasks either via the project phases or via tracking features / User stories (but not both).
Hence, there is a fair amount of freedom on how to use TDSP – which means that there can be a fair amount of variation on how team’s use TDSP (which can be good or bad, but suggests that each team needs to determine their own set of best practices).