James Lane
Last Updated
Productivity
Why are your platform teams reviewing the same infrastructure changes every week?

Why are your platform teams reviewing the same infrastructure changes every week?

We've sat down with some of our customers lately who have expressed a familiar challenge. Their most experienced engineers are spending too much time reviewing routine infrastructure changes. These conversations reflect a real pain observed across the industry as a whole. Senior engineers with deep tribal knowledge about systems and dependencies have become essential gatekeepers for every change, regardless of complexity. The challenge isn't just the volume of changes requiring review. It's that organisations are burning through their most valuable engineering time on routine updates that experienced team members have already approved dozens of times before.

Death by a Thousand Routine Changes

While various industry reports don't specifically categorise what percentage of changes are "routine," we can derive insights from their research. The 2022 State of Devops report by Puppet found that in organisations with mature DevOps practices, approximately 41% of deployments were automated end-to-end without manual intervention. This suggests that at minimum, these organisations had identified a substantial portion of changes that were predictable enough to automate.

Further evidence comes from Gartner research which found that through 2023, I&O teams that use AIOps and automated remediation tools will reduce operational incidents by as much as 50%. Indicating that a significant portion of operational work (including changes) follows patterns that can be identified and addressed systematically.

Some examples of ‘routine changes’ might include :

  • AMI updates within the same major version
  • Non-functional documentation updates
  • Parameter adjustments within pre-approved thresholds
  • Dependencies bumps for patches and security fixes
  • Resource scaling adjustments within defined parameters

Despite being low-risk, each change typically goes through the same review process, creating a significant bottleneck. A 2023 survey by GitLab found that developers spend approximately 15-20% of their time on code review, with respondents reporting that up to 30% of these reviews were for routine or minor changes that could potentially be automated or expedited.

To quantify this in hours: In a standard 40-hour work week, a developer might spend 6-8 hours on code review (15-20%). Of that time, approximately 1.8-2.4 hours per developer per week (30% of review time) is spent on routine changes. For a team of 10 developers, this translates to 18-24 hours weekly spent reviewing routine changes.

The Answer is Almost Always Not More Rules

Infrastructure resources rarely exist by theirselves, they’re usually part of a bigger system with lots of moving parts and dependencies. Traditional policy tools have a hard time making sense of those complex relationships. With something like OPA, you have to manually spell out all the context for each decision, what’s connected to what, which resources depend on others, and so on. The tool can’t make these connections on its own unless you take the time to write out every possible scenario.

Take a simple OPA rule like “All S3 buckets must have versioning enabled.” At first glance, this sounds straightforward but reality isn’t always so clear cut. Say you spin up a temporary test bucket for some quick experiments and don’t need versioning enabled because it adds unnecessary cost or complexity. The rule doesn’t care; OPA will flag or block the change, even though it makes sense in context.

As these kinds of one off scenarios keep popping up, your policy files start to swell with exceptions and edge cases. Before long, what began as a tidy set of rules is now a mess that nobody wants to touch or maintain.

Trying to capture that kind of temporary, dynamic context in a set of static rules is next to impossible. The end result? Policies that are either too rigid or full of holes, because they just can’t keep up with reality.

So what is the answer?

Rather than trying to predict every possible scenario upfront, we need systems that observe real team behaviour and understand what "routine" means for each specific organisation. A weekly AMI update might be completely routine for one team's well established deployment pipeline, but risky for another team that rarely touches infrastructure. The same change, different contexts, different risk profiles.

This is where predictive change intelligence comes in. Instead of rigid policies, you analyse patterns in your actual change history. When your platform team updates Redis timeouts every Tuesday for the past eight weeks with 100% success rate, that's not just a pattern, that's operational intelligence. When your data team scales RDS instances bi-weekly during the same maintenance window, that's routine for them, even if it would be unusual for other teams.

Another good example is security group rule modifications. (which nobody likes..) For a DevOps team that manages network access daily, adding port 5432 access between specific subnets follows their established database connectivity patterns. They understand and have that hard earned tribal knowledge of the network topology, have done similar changes hundreds of times and know exactly which services need access. For a frontend dev team focused on React components, that same security group change could accidentally expose databases or create unintended network paths they don't fully understand.

This is why many organisations end up relying on their most senior engineers to review every infrastructure change, regardless of complexity. These experienced team members have accumulated the tribal knowledge about which patterns are safe, which teams know what they're doing, and what could go wrong in specific contexts. But this creates an obvious bottleneck , your most valuable engineers spending time rubber stamping routine changes they've seen dozens of times before, instead of focusing on architecture decisions and complex problem-solving where their expertise actually matters.

At Overmind, we believe that infrastructure change shouldn't be something teams fear or spend endless cycles reviewing. As the change company, our mission is to give engineering teams the intelligence they need to deploy confidently and move fast without compromising safety.

If you are interested in trying us out today or following along on our journey - sign up or register your interest below.

We support the tools you use most

Prevent Outages from Config Changes

Try out the new Overmind CLI today for free.
No agents, 3 minute deployment.