James Lane
Published on
Last Updated
aws
The Hidden Complexity in Your Cloud Architecture Diagrams

The Hidden Complexity in Your Cloud Architecture Diagrams

As companies migrate their applications to the cloud, they can find their architecture diagrams become increasingly complex. These diagrams provide a visual representation of the various components and how they interact with each other. However, they may not accurately reflect the true complexity of the system.

During early access, our team had the opportunity to analyse 2.3 million AWS resources and dependencies to quantify the complexity hidden in architecture diagrams. What we found on average was 3 links for every resource. A ratio not commonly found in even some of the most complex architecture diagrams. Let’s take a look at why that’s a problem.

The problem *with complexity (explained by looking at houses)*
House Plan
House Plan

A house plan like the one above is great for giving you a visual representation of the layout and features (number of rooms, amenities.) However, say you wanted to add an extension or even drill a hole in the wall. Would you feel confident that you have everything required to avoid knocking down a structural wall or drilling through a gas line?

Detailed House Blueprint
House Blueprint

Instead, you might consider consulting the building plans or blueprint. They contain the information you need to confidently make your decisions. However while very useful, blueprints can be complex containing lots of measurements and annotations and if you don’t know what you’re looking for may cause more issues.

The same can be said for architecture diagrams…
AWS Overmind Diagram
AWS diagram

AWS diagrams like the one above are a great tool for onboarding new engineers or communicating a high-level overview to stakeholders. They give a clear but often concise representation of an application that does not require much prior experience or context to understand. But would you feel confident making a change to your application based on the above Knowing what we’ve already said above about hidden complexity? Even changing something simple like a security group could be problematic. The architecture diagram may show you some connections but there could be other EC2 instances or RDS databases that are also using that security group. If you make a change, it could impact those resources.

More is not always the answer

Does that mean the answer is to generate a diagram mapping out every link and resource that is related to the application that we are making changes to? To show you what that would look like on the same EKS cluster we can run a query in Overmind’s explore feature. We can set the link depth so that will discover all the relationships & links to other resources.

All linked items found using Overmind's Explore feature.

What you can see is that the same application actually has:

  • 164 related items
  • 39 related resource types

Which is much more than what our diagram was telling us. Meaning that now if we wanted to make a change we can see everything that could be impacted, the resources, items links, and meta-data all in one diagram.

But when you’re dealing with this level of detail it becomes a challenge to display and navigate easily in an interactive GUI let alone trying to replicate it in a drawn static architecture diagram.

More or Less?

Which leaves us in a difficult position because in order to confidently make changes we need to know what will be impacted and to know that we need to map out all links to the resource we are changing. But from what we’ve seen when even a simple application has that many related resources and links it can become a challenge to work with.

The solution

With Overmind's risks you can surface incident-causing config changes as part of your pull request. When a pull request is opened and a Terraform plan is executed you can calculate the potential impact (or blast radius) of your change. By parsing the Terraform plan output and then using only read-only AWS credentials it can map out your infrastructure. It queries AWS directly and discovers relationships automatically, working out what the actual impact of your change is. Even for things not managed under Terraform.

From this you are then able to check the affected items to see if there is anything unexpected. If you notice that the change might affect more than you thought, you can modify either your code, or the way you plan to roll out and monitor the change to account for it. You can then share this change or graph with your team or the change advisory board.

From the blast radius it also provides a list of human readable risks that can be reviewed prior to running Terraform apply. These risks can either be commented back as part of your CI / CD pipeline or viewed in the app. Using our Github action you can combine this as part of your workflow. The action will comment back on the pull request telling you the blast radius (everything that might be affected by the given change).

Overmind Risks

Inside the app you can see the full blast radius in a interactive graph along with any metadata Overmind was able to get from AWS. When you're ready to start the change, Overmind will take a snapshot before and after to validate that the change went through as intended.

Don’t just take our word for it…

We want to make it as easy as possible to get started, because of this we have created an example repository. It shows how to run terraform on GitHub Actions and automatically submit each PR's changes to Overmind and report back the blast radius as a comment on the PR. This way you can get started easily with either your personal or org AWS account.

  • Check out the example Terraform example repo here.
  • Get started with Overmind for free here.
  • Or join our Discord to take part in the next wave of Devops tools.