AI Tools Benchmark: Terraform Code Generation

Today, a variety of code generation tools are available, including low-code and no-code platforms, code completion, code refactoring utilities, and automatically generated APIs. While these tools employ different techniques and algorithms for code generation, their main goal is to speed up the dev process while ensuring the code remains useable, maintainable and compatible.

In this blog we will be trying some of the most popular ‘free to use’, publicly accessible tools to see how they get on generating and transforming code specifically to Terraform. Firstly we will look at their cost before then comparing them.

The list of the tools that we have chosen for the comparison is below :

There are obviously many more AI tools out there such as Github Co-pilot but they did not meet the criteria of ‘free to use’ at $10 per month. If you would like to see any others tested please drop a message in our discord.

Below is a table of the tools tested and at the time of writing this (Thursday 17th October) the current costs of each of the tools.

Tool	Cost / Usage
Chat GPT (GPT-4)	Free limited use.
GPT Marketplace - Terraform Expert	Free limited use.
Claude 3.5	Free of charge. Unlimited use costs $3 per million input tokens and $15 per million output tokens.
Gemini	Free of charge for unlimited use. Rate limited to 1 million tokens per minute.
Perplexity	Free unlimited use of Quick Search. $20 per month for Pro Search.
Stakpak	Free of charge for ‘limited use.’ $20 per month for unlimited use.
Overmind	Free of charge for unlimited assistant use.
Amazon Q	Free of charge for ‘free tier’ Amazon Q unlimited use.
Amazon Q Developer	Free tier limited to 50 interactions per month. $19/mo for pro tier with limits.
GPT-Script	Uses personal API keys so subject to provider costs.

‍

Comparison Test #1: Generating Terraform Code for EKS Clusters with Audit Logging

For our first test we will be asking our tools to create an Amazon Elastic Kubernetes Service (EKS) cluster using Terraform with audit logging enabled.

Objective: Evaluate whether our AI tools can effectively generate Terraform configurations that:

Enable audit logging for EKS clusters.
Create a CloudWatch log group for logging management.

Evaluation Criteria:

Correctness: Ensures audit logging is enabled and a CloudWatch log group is created.
Completeness: A comprehensive setup of necessary resources and configurations.
Usability: Code structure should be readable, reusable, and maintainable.

For each of the tests we will ask the tool a variation of the below question:

‍

Results

For each we have provided a summarised pros, cons and conclusion. If you are interested to view the full code snippets they are located in a public github repo linked under each tool.

ChatGPT (GPT-4)

Pros: Provides detailed setups beneficial for learning.
Cons: Lacks modularity and missed creating a CloudWatch log group.
Conclusion: Falls short by missing critical elements like the log group.

‍

GPT Marketplace - Terraform Expert

Essentially a system prompt on-top of ChatGPT-4.

Pros: Utilises Terraform AWS modules for VPC and EKS, which are well-tested and maintained by the community
Cons: Slightly more complex than necessary
Conclusion: Is a improvement over using just GPT-4

‍

Claude 3.5

Pros: Accurately created the CloudWatch log group and enabled audit logging.
Cons: Limited in advanced features compared to more comprehensive solutions.
Conclusion: Suitable for quick setups with best-practice naming conventions.

Gemini

Pros: Covers the key components including audit logging and IAM roles.
Cons: The IAM policy uses a wildcard "Resource": "*", which can pose significant security risks. It's crucial to restrict permissions to only what is necessary.
Conclusion: Gemini offers a solid foundation for setting up an EKS cluster, but it needs improvements in security and modularity.

‍

Perplexity

A ‘free-to-use’ ai powered search engine.

Pros: Correctly activated audit logging.
Cons: Omitted setting up the vital CloudWatch log group.
Conclusion: Lacks key components making it a less reliable choice.

Stakpak

Stakpak is an AI-powered DevOps IDE that helps you build, maintain and self-serve software infrastructure.

Pros: Most comprehensive with parameterised variables and proper dependency management.
Cons: Overlooked creating a role and VPC configuration, requiring manual setup. Fixed Node Scaling: The node group scaling settings fix the size to node_count, which doesn't allow for autoscaling
Conclusion: Best choice for complex setups with flexible, reusable modules.

‍

Overmind

Overmind Assistant is an interactive, LLM-powered chat tool that can help you troubleshoot incidents, explore applications, write documentation and generate terraform code.

Pros: Offers a straightforward setup that allows for customisation due to its more detailed resource configuration.
Cons: Manual configuration doesn't benefit from the ongoing improvements and testing found in community-maintained modules.
Conclusion: Overmind offers customisable resource configurations ideal for tailored setups but requires more maintenance due to its lack of modular abstraction

Amazon Q

Amazon Q is their generative AI–powered assistant that can answer questions, provide summaries, generate content.

Pros: N/A
Cons: Failed to write any code.
Conclusion: At the time of testing not able to write any Terraform code.

‍

Amazon Q Developer

Amazon Q Developer provides real-time code suggestions, from snippets to full functions, based on your comments and existing code.

Pros: It delivers a complete EKS setup with VPC, subnets, IAM roles, and CloudWatch logging, while employing modular design and dynamic availability zones for scalability.
Cons: The config lacks IGWs, NATs, uses hardcoded values, limited tags, and minimal IAM policy attachments.
Conclusion: Strong in setup and logging, the configuration requires better networking, flexible parameters, expanded IAM policies, and detailed tagging for enhanced adaptability.

‍

GPT-Script

GPTScript is a framework enabling Large Language Models to interact with local or remote systems, including executables, applications with OpenAPI schemas, SDKs, and RAG-based solutions, with minimal prompt requirements.

Pros: Offers a streamlined EKS setup with necessary logging, IAM roles, an extra policy for broader operations, and flexible subnet calculations using cidrsubnet.
Cons: Lacks detailed subnet tagging, modular VPC setups, and parameterization, which can limit flexibility across environments.
Conclusion: GPT-script provides an efficient EKS setup for quick deployments but may need customisation for complex environments and network adaptability. It is different from just the ChatGPT-4 one which shows they must have a system prompt in place to aid results.

‍

Comparison Test #2: Transform unmanaged AWS config into Terraform code

In this test we compare our tool’s generation capabilities by transforming existing configuration into Terraform code. The reason for this test is that you may not always be looking to provision new infrastructure but instead take what you have already and bring it under terraform management. Now, tools like firefly.ai offer a fully-fledged platform for exactly this and would suit much larger transformational projects. The aim of this test is to look at tools that are freely available, or available at a low cost.

There are also free ‘non-ai’ tools out there such as Terraformer by Google which can help you but for the scope of this blog we are only focusing on AI ones.

Objective: To evaluate how effectively AI tools can assist in generating Terraform code from unmanaged AWS infrastructure, ensuring all resources are identified and correctly configured.

Evaluation Criteria:

Discovery Completeness: Ability to identify all existing AWS resources within an account.
Accuracy: Correctly translating configurations, including inter-resource dependencies.
Efficiency: Providing neatly organised and human-readable output.
Customisability: Facilitating future adaptations through modular and reusable code.

‍

Testing Method

For this test, we prompted each tool with a similar set of AWS configurations, asking for a complete Terraform translation. Each tool was evaluated on its output quality and resource management efficiency.

For this test we had to rule out the following tools as they do not offer the ability to discover AWS resources and therefore will only be able to give general advice rather than specific results:

ChatGPT (GPT-4)
GPT Marketplace - Terraform Expert
Claude 3.5
Gemini
Perplexity
Stackpack
Amazon Q Developer
Github Co-pilot
GPT-Script

This left the following tools to test:

Overmind
Amazon Q

‍

Amazon Q

Pros: N/A
Cons: Failed to write any code.
Conclusion: Amazon Q can access your AWS infrastructure, and therefore in theory has access to the data required to transform it into Terraform. However it tells us that it can’t, even though it definitely can. We suspect this is a constraint of the model not understanding the question, and the safety features being overzealous.

‍

Overmind

Pros: Can discover and identify all existing AWS resources within an account, mapping these out in a graph. This means that it does a great job of including dependencies that exist in your AWS, even if not prompted.
Cons: Currently only available in their in-browser app, requires copying to get into your IDE
Conclusion: Being the only tool out of the list that can take any existing AWS infrastructure and convert that to Terraform (in this case using GPT-4o) means that it comes out on top for this test.

‍

We hope you found this helpful in deciding. The aim will be to both maintain and add new tools as the space continues to If you have other tools you're curious about or tips you'd like to share, please reach out on our Discord or drop us a PR on the Github repo:

‍https://github.com/jameslaneovermind/ai-tools-for-devops-comparisons

AI Tools Benchmark: Terraform Code Generation