Tyler Bird
Published on
Last Updated
announcement
Automatically Discover Risks With Overmind and GitHub Actions

Automatically Discover Risks With Overmind and GitHub Actions

In a few of our previous articles, we have discussed how Overmind would be able to give a new perspective on how to catch errors before they become production outages.

  • Like when Reddit had an outage on PI day and we looked at the unforeseen consequences of system dependencies.
  • Or when Loom fixed an incident in AWS CloudFront that caused confusion to users trying to sign in.

But both of these articles happen after the incident has already occurred. That led us to think of ways we could help use Overmind to spot risks earlier in the development cycle. Even more importantly we want to provide this information automatically and as soon as possible.

In other words, what if we could see the impact before the deploy?

  • You could run a scan with the Overmind CLI, sure.  But that’s a manual process, unless…
  • You could make the scan run automatically in your CI/CD workflow!
  • And you can see it as soon as possible when it’s just part of what happens when you create a pull request.

Risk Analysis

What is the Risk Analysis CI integration?

It’s a GitHub action that takes your change and submits the plan to Overmind for review automatically when you submit a PR

Create Pull Request

Let’s take a look at this pull request in the terraform-example repository.

Just after the PR is created, it kicks off automatic jobs. Upon initial success, a comment in the PR gets created that lists any “Expected Changes” you need to know about.

What is Changing?

We can see that our change is pretty basic, just a port reconfigure.  What could go wrong?

Well, apparently something could go wrong because Overmind found something high-risk.

Overmind discovers that if we only change the health check port and we don’t update the ECS task definition, then the health check will fail.  Not only that, it tells us where we’ll need to configure the portMappings array in ECS to fix the problem.

How the Risk Analysis Works

What does it take to pull this off? We’ll cover some of the high-level prerequisites quickly and then dig deeper when it comes to the specifics of how Overmind helps.

💡 NOTE: This repo is inspired by the conditions of the loom outage.  The repo creates resources like CloudFront, Application Load Balancers, task definitions in ECS Fargate, and so on.

1. You’ve got your GitHub Actions set up in your .github folder.  Let’s use our terraform-example repository and take a look at the automatic.yml  workflow.

2. Beyond the usual boilerplate actions to check the Terraform plan, we use the GitHub Action’s secret store to provide the API key for the Overmind CLI.

3. Then the workflow has the actions to install the CLI and submit the plan to the Overmind app.  Once we’ve signed in and sent the plan, this is where the fun begins.

- uses: overmindtech/actions/install-cli@main
  with:
    version: latest
    github-token: ${{ secrets.GITHUB_TOKEN }}

- uses: overmindtech/actions/submit-plan@main
  if: github.event.action != 'closed'
  id: submit-plan
  with:
    ovm-api-key: ${{ secrets.OVM_API_KEY }}
    plan-json: ./tfplan.json

4. When you submit the plan you are running the custom action overmindtech/actions/submit-plan we’ve created.  That action sends your code changes and the Terraform plan to the Overmind app via the CLI.

./overmindtech/ovm-cli submit-plan \
    --title "$title" \
    --description "$description" \
    --ticket-link "$ticket_link" \
    $code_changes_arg \
    $tf_plan_output_arg \
    --log '${{ inputs.log }}' \
    ${{ inputs.plan-json }} \
    > ./overmindtech/change-url

5. Overmind ingests the plan and discovers the resources in AWS that will be affected.

6. Yet we don’t stop there.  We create a blast radius by taking the affected resources and scanning for everything that depends on those resources.

7. Finally, the overmindtech/submit-plan action takes the change and the blast radius and feeds it into Overmind to summarize risks and add that report directly to a comment in the pull request.

💡 NOTE: For the greatest detail check out the action on GitHub.  https://github.com/overmindtech/actions/blob/main/submit-plan/action.yml

Now as soon as you create a PR, Overmind gets straight to work and puts anything important front and center.

Key takeaways

  • Now every PR uses Overmind to automatically scan your infrastructure and identify risks
  • You can see risks earlier than ever before
  • It’s straightforward to add Overmind automation to any CI/CD workflow
  • Coming soon Overmind CLI, create changes straight from your command line by running `Overmind Terraform Plan / Apply`

Caveats

We realize that this example works best when you are using Terraform, AWS, and GitHub Actions.  What if that’s nothing like your workflow?

💡 We want to hear from you!

We’ve only begun to scratch the surface of what we can do to help scan for and reveal problems you may not be expecting.  That’s why we are also proud to announce our Design Partner Program.

If you can see the potential for Overmind and need to try it in your workflow, talk to us immediately!

Watch it in action