Does it make sense to apply observability practices to Terraform? And I don’t mean using Terraform to configure your alerts in Datadog, I mean actually observing what Terraform is doing. Unless we’re willing to expand the definition of observability I’d say; no it’s not. But if we think outside observability’s pillars we’re actually missing some obvious opportunities to prevent outages.
I think we’re all in agreement now that the three pillars of observability are metrics traces and logs, and that the practice of observability involves working out what’s going on inside a system, by looking at what’s coming out (metrics traces and logs). This idea though doesn’t make much sense when applied to Terraform for a few reasons:
Continuously collecting and aggregating metrics, traces and logs simply doesn’t help us to see what Terraform is doing and what its effects are.
So what does?
Terraform is primarily concerned with managing config, this could be the size of a volume, the value of an environment variable, or the attributes of a security group. Terraform also affects “state” information that is often overlooked, this is things like:
These two new pillars are currently not collected in any meaningful way by any observability tools (that I know of), and are what we are focusing on at Overmind.
Overmind tracks your terraform changes at every point in your workflow, allowing you to move faster with more confidence:
When running terraform plan:
When running terraform apply: