How to Know Your Data Science/ML Project Will Fail Before You Begin
Written with Stephen Pettinato.
Data science is a paradox. It’s called the “sexiest job of the 21st century” yet has a 70-85% project failure rate. Still, the demand for data professionals far exceeds supply.
The combination of high demand and high failure rates is counterintuitive. Why do businesses keep investing in data resources? Companies keep investing because they have to. According to McKinsey Research, the gap between leaders and laggards in data adoption is growing, and only a tiny fraction of the potential value is unlocked. Frustration builds as businesses invest in data teams and don’t feel they get a good return.
Data professionals are frustrated, too. A peculiar trait of data folks is their intense curiosity. We want to know why things work at a deeper level. We also want to build products and make recommendations that drive impact. It’s soul-crushing when we invest deeply in a project only to watch it not reach its full potential.
Given the high stakes and frequent failures in data science projects, it’s crucial to identify the root causes of these failures early. Through our experience overseeing hundreds of data projects, we’ve observed recurring patterns that correlate with outcomes. Surprisingly, many determinants of success or failure are evident before a project begins. We’ve identified ten critical issues that are early warning signs for project failure. Understanding and addressing these factors can improve your chances of success.
Failure Drivers
Issues causing a 90%+ failure rate:
- You can’t explain why we are doing this work. “Because [insert important stakeholder] says so” is not a good answer. The original request is rarely the right problem to solve, so it’s important to iterate and clearly define the opportunity.
- You can’t explain why this work is meaningful or discuss the opportunity cost of your time. A current “good enough” solution is often “good enough” compared to other opportunities. Remember that a yes on your time is a no to others.
- You aren’t clear on how you measure success. Be specific.
- The deliverables are fuzzy, placeholders, or unclear. This indicates a misalignment between you and your stakeholders.
- Your stakeholders can’t answer any of the above questions. This indicates a misalignment between you and your stakeholders.
Issues causing a 70%+ failure rate:
- You don’t have semi-frequent milestones. Splitting a project into bite-sized pieces is helpful in many ways.
- Your stakeholder section resembles the company’s About Us page. Too many create a mess, and too few are ineffective. Find what works best for your culture and adjust over time.
- You don’t discuss the project’s end. Document it to ensure your team accounts for maintenance in planning. This step leads to failure because you can overburden yourself as an organization if you aren’t clear on priorities and where you spend your time.
- You shared a slide deck with me. It’s too easy not to think deeply with slides. However, slide culture is ingrained in some company cultures, and you don’t have a choice. You can still write out your plan in detail to force deeper thinking and use it to summarize it for a presentation. It’s also easier to collaborate with a doc instead of slides.
- Your design document points to little or no internal resources. I am skeptical about pitching a project that relies little on current architecture and systems. It is a warning sign that you are duplicating work or didn’t get enough feedback on your design. If the system is bespoke, specify why.
After identifying the key drivers of project failure, you may wonder how to address these challenges in practice. The project template we’ve developed targets these points. Below is a quick reference showing how a specific section of the template handles each driver:
Failure Driver | Template Section | How the Template Solves It |
---|---|---|
You can’t explain why we are doing this work. | Problem Statement | Defines the core problem in alignment with business value, ensuring clear project motivation. |
You can’t explain why this work is meaningful or discuss the opportunity cost of your time. | Objectives and Outcomes | Specifies measurable outcomes with clear KPIs to track success and progress, helping you focus on high-impact work. |
You aren’t clear on how you measure success. | Objectives and Outcomes | Clarifies success metrics and defines KPIs to ensure measurable progress is tracked. |
The deliverables are fuzzy, placeholders, or unclear. | Deliverables | Clearly outlines deliverables tied directly to project objectives, preventing ambiguity. |
Your stakeholders can’t answer any of the above questions. | Stakeholders and Roles | Clarifies roles and responsibilities to ensure alignment with stakeholder input. |
You don’t have semi-frequent milestones. | Milestone Plan | Breaks the project into actionable steps with consistent updates and opportunities for feedback. |
Your stakeholder section resembles the company’s About Us page. | Stakeholders and Roles | Streamlines involvement, balancing stakeholders while avoiding overload. |
You don’t discuss the project’s end. | Close-Out Plan | Includes post-project maintenance and handover details to ensure long-term viability. |
You shared a slide deck with me. | System Design and Architecture | Provides detailed system design documentation, ensuring alignment between architecture and deliverables. |
Your design document points to little or no internal resources. | System Design and Architecture | Connects to existing systems and resources to avoid redundancy and ensure smooth integration. |
Project Template
Overview
This project template, also on GitHub (link), helps teams navigate each phase of a data project, from planning to deployment and post-launch evaluation. It emphasizes adaptability, feedback, and continuous improvement in project management.
This template isn’t just a document to fill out. It’s a tool to guide your thinking, facilitate crucial conversations with stakeholders, and ensure you’ve considered all critical aspects of your project before starting.
This template is a guide, not a rigid set of instructions. It provides structure but does not dictate every step. Projects evolve, and so should your approach. Remain flexible and adapt the template to fit your project’s needs rather than forcing it to fit the template.
Project lead tips:
- Be flexible and adjust the template as needed. This template is a starting point and guide, not a prescriptive destination.
- Not all feedback is useful, but you can never learn less by listening and gathering input.
- Collaborate on the design. As the project lead, you should own the overall structure and voice, but don’t write everything yourself. Involve your team by assigning section ownership.
- This document needs to stand alone. It helps with onboarding, meeting new stakeholders, and project reviews.
- Incorporate feedback loops in key phases, particularly during development and testing, to ensure the project aligns with technical and business goals.
Leadership tips:
- After thoroughly reading the document, ask your team to review it. Don’t use slides.
- Avoid setting OKRs and goals that focus on completing the document. There’s a fine line between incentivizing your team to gather feedback and think through solutions and filling it out to complete a task.
- Provide your team with as much context upfront as possible. It’s demotivating for a group to work on a design for weeks only to hear they went in a wildly different direction than expected. It’s your responsibility when this happens, so seek further clarity and set better expectations.
- Incentivize your team to keep this document updated throughout the project. If your culture allows, encourage them to make projects discoverable so others can build on their work.
- Encourage ownership, not just accountability. While accountability ensures tasks are completed, ownership fosters a deeper connection to the project’s success. Involve your team members in decision-making and give them autonomy to solve problems. When people feel ownership, they’re more likely to proactively address challenges and innovate rather than complete tasks to meet deadlines.
Project Name
Provide a one-sentence overview that emphasizes customer value and defines success.
Problem Statement
Clearly understand the project’s rationale and value. Think expansively rather than reductively at first. For example, ask questions like, “In a perfect world with this data and a great prediction, what would you do?” Rarely is a specific stakeholder ask the right problem to solve. This statement may evolve as you gain insights throughout the project. Here’s a starting framework:
- Situation: Describe the current context, e.g., “Our platform has experienced increased traffic, but we are not seeing a corresponding rise in conversion rates.”
- Complication: Articulate the core challenge, e.g., “Despite the increased traffic, conversion rates have stagnated, affecting revenue growth.”
- Solution: Outline the high-level solution, e.g., “We need to develop a recommendation engine to improve the user experience and boost conversions.”
Current Solutions and Lessons Learned
Existing Methods
- Current Solutions: Summarize past problem-solving efforts, including current solutions. What happens if you don’t do this work? A current “good enough” solution is often “good enough.” Remember that a yes on your time is a no to other opportunities.
- Limitations: Highlight the shortcomings of these methods and explain why new approaches are needed.
Insights From Related Past Projects
- What Worked: Reflect on successful past projects and how those lessons apply here.
- What Didn’t Work: Acknowledge past issues to avoid them in this project.
Objectives and Outcomes
Objectives
Define and link the project’s strategic, long-term goals to broader company goals.
- Objective 1: e.g., “Increase revenue by improving customer experience.”
- Objective 2: Additional objectives as necessary.
Outcomes
Specify the measurable, short-term outcomes for success and corresponding Key Performance Indicators (KPIs).
- Outcome 1: e.g., “Achieve a 10% increase in conversion rates in six months.”
- Outcome 2: Additional outcomes as necessary.
Scope
In Scope
Clearly define the project’s scope to ensure focus and clarity.
- Feature 1: e.g., “Developing a machine learning model for personalized recommendations.”
- Feature 2: Additional in-scope features.
Out of Scope
Define what is deliberately excluded to prevent scope creep.
- Exclusion 1: e.g., “Redesigning the entire user interface.”
- Exclusion 2: Additional exclusions as necessary.
Stakeholders and Roles
Stakeholders
List the key stakeholders involved in the project. The full design must be reviewed with them. They are essential for establishing guidelines, approving requirements and timelines, and giving feedback on the details critical to the project’s success.
Stakeholder | Name | Responsibilities | Approval Status |
---|---|---|---|
Stakeholder 1 | |||
Stakeholder 2 | |||
Stakeholder 3 |
Roles
List the key roles required for the project.
Role | Name(s) | Responsibilities |
---|---|---|
Role 1 | ||
Role 2 | ||
Role 3 |
Requirements
Functional Requirements
What does the system do? Define the business and technical functionalities required to meet the project objectives.
- Business Functionality: e.g., “Business stakeholders need weekly reports on customer churn predictions.”
- Technical Functionality: e.g., “The system shall generate personalized product recommendations based on user activity.”
Non-Functional Requirements
How should the system do it? Outline the performance, scalability, and other critical constraints it must meet.
- Performance: e.g., “System must generate recommendations within 100ms.”
- Scalability: e.g., “System should support up to 10,000 concurrent users.”
- Compliance: e.g., “Must comply with GDPR regulations for data privacy.”
System Design and Architecture
This section outlines a general approach for most data and engineering projects. What will this look like when finished? Be specific—diagrams, a picture or handwritten design, dashboard mocks, etc. Ideally, link to pre-existing resources. The design should evolve as the project progresses.
Architecture Overview
Provide an overview of the system’s architecture and design. Encourage early sketches and refinement over time.
- System Diagram: Include an early sketch of the system design. This diagram can evolve with user feedback and testing results.
- Components: Describe the main components, such as data storage, processing units, and user interfaces.
Inputs, Algorithms, and Outputs
Define the inputs, algorithm, and outputs for the system.
- Inputs: e.g., “User activity logs from
/data/logs/user_activity/
in JSON format.” Document if sources are unknown. - Algorithms: e.g., “The recommendation system uses collaborative filtering, specifically matrix factorization, to generate personalized product recommendations. It processes user-item interaction data, factorizes it into lower-dimensional user and item matrices, and uses these matrices to predict user preferences for unseen items. The model is retrained weekly on historical data and updated daily with incremental learning on new user interactions.”
- Outputs: e.g., “Recommendation lists stored in Redis using the following structure:
- Key:
user:<user_id>:recommendations
- Value: JSON string of recommendation data
- Example:
- Key:
Key: user:12345:recommendations
Value: {
"timestamp": "2023-06-15T14:30:00Z",
"rec_items": ["item1", "item2", "item3"],
"rec_scores": [0.95, 0.85, 0.75],
"model_version": "v1.2"
}
- Each Redis entry represents a set of personalized recommendations for a specific user at a given time. The JSON string contains the following fields:
timestamp
: Time recommendations were generated.rec_items
: List of recommended item IDs.rec_scores
: Corresponding recommendation scores.model_version
: Version of the model used for recommendations.”
Product Usage
Explain how the system will be used and integrated into the broader ecosystem.
- Usage Scenarios: e.g., “After logging in, users receive personalized product recommendations on their homepage.”
- Interacting Systems: e.g., “E-commerce platform frontend for displaying recommendations.”
Deliverables
What are the final deliverables? They should be linked to objectives and outcomes and agreed upon with stakeholders. Based on feedback and project progress, they may be adjusted or reprioritized.
- Deliverable 1: e.g., “Deployed recommendation engine integrated into the e-commerce platform.”
- Deliverable 2: e.g., “Comprehensive documentation, including technical specifications and user guides.”
- Deliverable 3: e.g., “Training sessions for relevant teams on system usage and maintenance.”
Milestone Plan
This plan outlines key project milestones with target dates. The team will determine how to achieve these through their preferred sprint or work cycle structure. Target dates may shift as the project progresses and the team learns more, notably in Agile environments. Regular check-ins will help stay aligned and adjust the plan. Covering the “what” and “when” in a milestone plan is important.
Milestone | Description | Jira Epic(s) | Deliverable(s) | Due Date |
---|---|---|---|---|
Planning | Finalize project plan and gather approvals | [Jira Epic Link] | Project Plan Document | [Date] |
Design | Complete system design and architecture | [Jira Epic Link] | System Architecture Diagram | [Date] |
Development | Develop core functionalities | [Jira Epic Link] | Working Prototype | [Date] |
Testing | Perform unit and integration testing; incorporate feedback | [Jira Epic Link] | Test Reports | [Date] |
Deployment | Deploy system to production | [Jira Epic Link] | Deployed System | [Date] |
Training | Train users and stakeholders | [Jira Epic Link] | Training Materials | [Date] |
Evaluation | Assess performance against success metrics; refine based on feedback | [Jira Epic Link] | Evaluation Report | [Date] |
Closure | Complete documentation and handover | [Jira Epic Link] | Project Close-Out Document | [Date] |
Risk Management
Identify potential risks and mitigation strategies.
- Risk 1: e.g., “Data privacy breaches from improper handling of user data.”
- Likelihood: Medium
- Impact: High
- Mitigation Plan: Implement strict access controls, data encryption, and regular security audits.
- Risk 2: e.g., “Model accuracy not meeting performance criteria.”
- Likelihood: Medium
- Impact: Medium
- Mitigation Plan: Allocate time for model tuning, use cross-validation, and plan for multiple iterations.
Ethics, Security, and Compliance
Ethical Considerations
Evaluate the ethical implications of the project.
- Bias and Fairness: e.g., “Ensure the recommendation engine doesn’t favor certain user groups.”
Security Measures
Ensure the system follows security best practices.
- Data Protection: e.g., “Encrypt all stored user data using AES-256.”
Compliance Requirements
Ensure the system complies with regulatory standards.
- Compliance: e.g., “Ensure GDPR and CCPA compliance for user data handling.”
Close-Out Plan
Project Completion
Ensure stakeholders complete and accept all agreed-upon deliverables.
Maintenance and Handover
- Ongoing Maintenance: Plan for system monitoring, updates, runbooks, and support.
- Handover Plan: Provide thorough documentation and training if the project is handed off to another team.