Rob Finn
Last Updated
Productivity
How DORA Metrics Improve Talent Retention, Enhance Employee Well-being and Boost Reliability

How DORA Metrics Improve Talent Retention, Enhance Employee Well-being and Boost Reliability

In our combined 20 years of DevOps experience, metrics like deployment frequency and lead time often take centre stage. However, the 2023 State of DevOps Report research reveals a bridge between reliability, employee well-being, and overall organisational performance. In this post, we'll explore how prioritising reliability through DORA metrics can create a positive feedback loop that benefits your teams and your bottom line, while addressing the challenges of employee retention, knowledge transfer, and onboarding.

The High Cost of Turnover & Reliability

Did you know that the average tenure of a DevOps Engineer at Google is a mere 1.1 years, as reported by Dice? Similarly, data from LinkedIn Talent Insights (February 2021) reveals an average tenure of 1.1 years for DevOps engineers in London. This rapid turnover, combined with the steep learning curve of team-specific knowledge, creates a significant challenge. New hires often struggle with productivity due to the reliance on this internal knowledge, which isn't easily documented or transferred. This forces experienced engineers to dedicate valuable time to mentorship, hindering overall team efficiency.

Adding to this challenge, the 2023 State of DevOps Report found that 49% of respondents experience a 15% or higher change failure rate on deployments, with 17% experiencing a failure rate of 64% or more (if your failure rate is 15%, this translates to 7.7% of their time on failed deployments, or for a team of 4 DevOps engineers in Chicago it costs $46,200 per year). This high rate of failure not only impacts system reliability but also contributes to the stress and burnout that can drive turnover.

The report further highlights that certain types of work, particularly unplanned work and rework caused by failures, are significant predictors of burnout. When engineers are constantly pulled into firefighting mode to address incidents, it leaves little room for innovation, knowledge sharing, or professional development.

(Note: The 2024 State of DevOps Report is currently open for respondents. Your insights can help shape the future of DevOps. Visit https://dora.dev/ for details)

Why Reliability Matters (Even More Than You Think)

  • Reduced Stress and Burnout: When systems are unreliable and change failures are frequent, it creates a constant state of firefighting and stress, further fuelling turnover. By prioritising reliability, you minimise unplanned work, reduce burnout, and foster a more sustainable work environment.
  • Improved Morale and Collaboration: Reliability builds trust. Engineers are less likely to fear deployments when they have confidence in their systems and processes. This leads to better collaboration, a more positive, supportive atmosphere, and a willingness to share knowledge – key factors for retaining talent and encouraging mentorship.
  • Increased Innovation (and Knowledge Transfer): With fewer fires to put out and a less stressful environment, experienced engineers have more time to mentor new hires, share valuable team-specific knowledge, and focus on innovative solutions that drive continuous improvement and reduce change failures

DORA Metrics and Employee Well-being: The Evidence

The Google research shows that elite & high-performing teams (those excelling in DORA metrics, being 49% of respondents in 2023) report:

  • Lower burnout rates: Teams focused on reliability experience significantly less burnout, reducing the likelihood of employees seeking greener pastures.
  • Higher job satisfaction: Engineers feel more fulfilled when they can trust their systems and aren't constantly battling fires, contributing to higher retention rates.
  • Stronger psychological safety: These teams foster environments where it's safe to ask questions, learn, and grow, making them more attractive to both new and experienced talent

Practical Strategies for DevOps Leaders

  1. Make Reliability a Cultural Cornerstone:
    • Communicate the link between reliability, employee well-being, knowledge transfer, and change failure rates.
    • Celebrate successes in improving reliability metrics, knowledge sharing, successful onboarding, and reducing change failures.
    • Invest in tools, training, and mentorship programs that support reliability practices and knowledge transfer.
  2. Empower Teams to Own Reliability (and Onboarding):
    • Give teams ownership of their systems, processes, onboarding initiatives, knowledge documentation, and change management practices.
    • Encourage a blameless culture where focus is on learning from failures and continuous improvement, not assigning blame for knowledge gaps or mistakes.
    • Teams should be able to discover what’s running and what it’s dependencies are without relying on out-of date documentation (See Overmind, continuous discovery of your AWS environments, allowing teams to preempt changes before they happen)
  3. Measure and Track DORA Metrics (with Empathy and Context):
    • Use DORA metrics to gauge progress, but avoid creating a pressure-cooker environment that hinders knowledge sharing or leads to risky shortcuts.
    • Balance metrics with qualitative feedback from teams about their well-being, onboarding experiences, and knowledge gaps.
    • Factor in the ramp-up time for new hires and the impact of change failures when evaluating team performance.

Reliability is not just about uptime. It's about creating a work environment where engineers thrive, knowledge flourishes, change failures are minimised, and your business excels. By embracing the insights from the State of DevOps Reports and prioritising reliability through DORA metrics, you can cultivate a high-performing, resilient culture that attracts and retains top talent.

Want to learn more about avoiding costly disruptions? Download our latest white paper, "What We Can Learn From 5 High Profile Outages." Dive into real-world case studies like Klarna's payment halt and Reddit's 314-minute outage to understand the complex vulnerabilities and behaviours that led to these events. Learn actionable insights from the STELLA report and see how Overmind is pioneering solutions to tackle these challenges.

Authors, Rob Finn & Dylan Ratcliffe

We support the tools you use most

Prevent Outages from Config Changes

Try out the new Overmind CLI today for free.
No agents, 3 minute deployment.