CAIR Framework: Evaluating AI for Software Planning

In 2021, Spotify's engineering team documented a sobering reality about their planning process.

One squad discovered their average work item completion time had ballooned to 8.1 days, with engineers reporting "we never finished our sprints" and having "virtually no gauge of how much work we were bringing in." Product managers struggled with "a difficult time knowing what could reasonably be expected to be delivered."

This wasn't just a Spotify problem. Industry research reveals that 60-80% estimation error rates are standard across software teams, with only 39% of IT projects succeeding according to their original plans.

ThoughtWorks analyzed their own internal projects and found something startling: story point burn-up charts were nearly identical to simple story count charts, suggesting that complex estimation efforts provide no additional value.The stakes extend far beyond missed sprints.

Knight Capital Group lost $440 million in 45 minutes due to planning failures in their deployment process - they had allocated only 30 days to implement, test, and deploy critical trading system changes.

Microsoft Azure suffered a 12-hour global outage because their planning failed to account for leap year edge cases. These aren't just technical failures - they're planning failures with documented root causes.

Meanwhile, engineering managers spend 17.9 hours per week in meetings (44.75% of their work time), much of it on planning activities that research suggests may provide little value.‍

AI promises to solve these planning problems, but which applications actually deliver results versus creating new headaches?

Use Case #1: Autonomous Sprint Goal Generation

Description: AI analyzes product roadmap, previous sprint outcomes, team capacity, and stakeholder priorities to automatically generate sprint goals. The system examines backlog items, business objectives, and team velocity to create focused sprint objectives without human input.

DevOps Phase: Plan

Current Pain Point: Product managers spend significant time crafting sprint goals that align with business priorities while being realistic for team capacity.

Spotify documented this problem: before implementing proper planning practices, teams had "little sense of how our day-to-day work fit into a larger quarterly picture" and product managers had "a difficult time knowing what could reasonably be expected to be delivered."

Success Scenario: AI generates clear, achievable sprint goals that perfectly balance business value with technical reality, freeing PMs to focus on longer-term strategy.

Failure Scenario: AI creates sprint goals that sound reasonable but are technically impossible, strategically misaligned, or miss critical dependencies, causing teams to work toward the wrong outcomes for entire sprints.

Knight Capital's case shows the extreme version: poor planning led to $440 million in losses in 45 minutes when their deployment process failed.

CAIR Evaluation

Value of Success: 4/5 - Saves PM time and potentially creates better-aligned goals
Perceived Risk: 5/5 - Wrong goals misdirect entire team for 2 weeks, affecting delivery and morale
Effort to Correct: 4/5 - Hard to detect until sprint is underway, requires mid-sprint replanning

CAIR Calculation: 4 ÷ (5 × 4) = 0.2

Use Case #2: Meeting Notes Summarization and Action Item Extraction

Description: AI listens to planning meetings and automatically generates summaries with extracted action items, decisions made, and next steps.

The system identifies who committed to what and creates trackable tasks from meeting discussions.

DevOps Phase: Plan

Current Pain Point: Someone needs to take detailed notes during planning meetings, which distracts from participation.

Action items get lost or unclear ownership leads to dropped commitments. Engineering managers already spend 17.9 hours per week in meetings (44.75% of their work time), leaving limited focus time for meaningful work.

Success Scenario: Perfect meeting summaries capture all decisions and action items with clear ownership, allowing everyone to fully participate in discussions.

Failure Scenario: AI misses critical action items, misattributes ownership, or summarizes decisions incorrectly, leading to confusion about commitments and next steps.

CAIR Evaluation

Value of Success: 3/5 - Good time savings and ensures nothing falls through cracks
Perceived Risk: 1/5 - Missing items are easily caught, no production impact
Effort to Correct: 1/5 - Simple to add missed items or fix incorrect summaries

CAIR Calculation: 3 ÷ (1 × 1) = 3.0

Use Case #3: Story Point Estimation Assistance

Description: AI analyzes user stories and suggests story point estimates based on similar past work, code complexity analysis, and team velocity patterns. Provides estimates as starting points for team discussion rather than final decisions.

DevOps Phase: Plan

Current Pain Point: Estimation sessions are time-consuming and heavily dependent on senior engineers. Junior developers can't assess technical complexity, leading to either silence or wildly inaccurate estimates. Industry research shows 60-80% estimation error rates are common, with ThoughtWorks finding that complex story point estimation provides no additional value over simple story counts.

Success Scenario: AI provides consistent baseline estimates that speed up planning and help junior developers understand effort drivers, making estimation sessions more efficient and inclusive.

Failure Scenario: AI suggests misleading estimates that anchor team discussions in wrong directions, leading to systematic over or under-estimation and sprint planning failures.

Spotify achieved a 52% improvement in completion time (from 8.1 to 3.9 days) when they fixed their planning processes.

CAIR Evaluation

Value of Success: 4/5 - Significant time savings and improves team estimation consistency
Perceived Risk: 2/5 - Bad estimates affect sprint planning but team can override
Effort to Correct: 2/5 - Easy to adjust estimates through team discussion

CAIR Calculation: 4 ÷ (2 × 2) = 1.0

Results Summary

High Priority (CAIR > 3): Invest immediately
Medium Priority (CAIR 1-3): Pilot carefully with safety measures
Low Priority (CAIR < 1): Avoid or delay until you can improve the score

The planning stage reveals a clear pattern: AI works best when it handles information processing rather than decision-making. Meeting notes summarization scored highest because mistakes are easily caught and corrected, while autonomous sprint goal generation scored lowest because wrong decisions can misdirect entire teams for weeks.

Start Here: Meeting notes summarization offers the best immediate opportunity.

Tools like Otter.ai or meeting bots provide clear value with minimal risk. The 17.9 hours per week that engineering managers spend in meetings represents massive potential time savings.

Pilot Carefully: Story point estimation assistance can provide value, but requires human oversight.

Given that industry research shows 60-80% estimation error rates and ThoughtWorks found story points provide no additional value over story counts, AI suggestions should supplement rather than replace human judgment.

Avoid For Now: Autonomous sprint goal generation scored the lowest CAIR score because the stakes are too high.

Knight Capital's $440 million loss in 45 minutes demonstrates what happens when planning processes fail in high-stakes environments. Until AI becomes significantly more reliable, keep humans in the decision-making loop.

The key insight from the planning stage is that AI adoption should mirror the documented improvement patterns: Spotify achieved a 52% improvement in completion time not by replacing human planning, but by implementing better structured processes.

AI should enhance these processes rather than replace human judgment.

References

_{Spotify Engineering. "Achieving Team Purpose and Pride with Scrum." Engineering at Spotify Blog, May 2021.}_{https://engineering.atspotify.com/2021/05/achieving-team-purpose-and-pride-with-scrum}

_{Clockwise. "Software Engineering Meeting Benchmarks." 2024.}_{https://www.getclockwise.com/eng-meeting-benchmarks}

_{ThoughtWorks. "Perspectives on Estimation." 2013.}_{https://info.thoughtworks.com/rs/thoughtworks2/images/twebook-perspectives-estimation_1.pdf}

_{ACM/IEEE. "Effort estimation in agile software development." Proceedings of the 19th International Conference on Evaluation and Assessment in Software Engineering, 2015.}

_{Henrico Dolfing. "Case Study 4: The $440 Million Software Error at Knight Capital." 2019.}_{https://www.henricodolfing.com/2019/06/project-failure-case-study-knight-capital.html}

_{Microsoft Azure. "Summary of Windows Azure Service Disruption on Feb 29th, 2012." Microsoft Azure Blog, March 2012.}_{https://azure.microsoft.com/en-us/blog/summary-of-windows-azure-service-disruption-on-feb-29th-2012/}

_{GitHub. "October 21 post-incident analysis." The GitHub Blog, October 2018.}_{https://github.blog/2018-10-30-oct21-post-incident-analysis/}

_{Code Climate. "Knowledge Silos are Holding Back Your Engineering Team." 2024.}_{https://codeclimate.com/blog/engineering-knowledge-silos}

_{VentureBeat. "Stack Overflow data reveals the hidden productivity tax of 'almost right' AI code." 2024.}_{https://venturebeat.com/ai/stack-overflow-data-reveals-the-hidden-productivity-tax-of-almost-right-ai-code/}

CAIR Framework: Evaluating AI for Software Planning

Use Case #1: Autonomous Sprint Goal Generation

CAIR Evaluation

Use Case #2: Meeting Notes Summarization and Action Item Extraction

CAIR Evaluation

Use Case #3: Story Point Estimation Assistance

CAIR Evaluation

Results Summary

References

Add Real Impact Analysis to Your Atlantis Workflow

Stop Guessing at Cloud Costs With Overmind's Infracost Integration

CAIR Framework: Evaluating AI for Software Testing

Adding Policy Checks Right Into Your PR Reviews with Custom Signals

HashiConf 2025: Overmind's guide to this years conference

CAIR Framework: Evaluating AI for Build Automation