Terraform and CICD - keeping IaC safe
Terraform is a powerful IaC tool, but it can also be dangerous if not managed properly. In this article, we will explore some best practices for using Terraform in a CI/CD pipeline to keep your infrastructure safe.
Introduction
Terraform uses declarative language to define and provision infrastructure and stores the state of your infrastructure in a state file. When you define resources in your Terraform configuration files, Terraform compares the desired state with the current state of your infrastructure and makes the necessary changes to bring the two into alignment. This means that if you make a change to your configuration files, Terraform will automatically update your infrastructure to match the new configuration.
Into the danger zone
In some DevOps communities, Terraform has a reputation for being dangerous. The concerns are absolutely legitimate. If you are not careful, you can easily destroy your entire infrastructure with a single command. Here are some of the reasons why Terraform can be dangerous, I'll explain them in a horror story.
Horror story
Your infrastructure is growing, and more people are joining the team. You want
to refactor your code to make it more readable and easier to maintain. You
decide to move Azure Resource Groups into their own module. You create a new
module and move the Resource Group definitions into the new module. You update
the references to the Resource Groups in your main configuration files. You run
terraform apply --auto-approve
to apply the changes. Minutes later, you get a
call from your colleague. "The Resource Groups are gone! All the resources in
the Resource Groups are gone! What did you do?"
It sounds crazy, but this is a real story. This was the nature of the problem:
- Our villain, The dreaded Terralith. Everything was in one big state file. The blast radius was huge if something goes south.
- The resources were not moved in the state file. If Terraform cannot find the formerly declared resources in their original location in the state, they will be destroyed and new resources will be created.
- Not following best practices. No code reviews. No
terraform plan
reviews. Just blindly runningterraform apply --auto-approve
in production.
Terraform Best Practices
Split your state files
Slay the Terralith! Split your state files into smaller, more manageable pieces. This will reduce the blast radius if something goes south. To prepare for CI/CD, you should use backend config files to point to different state files for different environments.
Example dev.tfbackend:
storage_account_name = "tfstate18373"
container_name = "tfstate"
key="HillestadTech/dev.tfstate"
Use a remote backend for your state file
As soon as you have more than one person working on your infrastructure, you should use a remote backend for your state file. This will ensure that everyone is working on the same state file and prevent conflicts. In Azure, you can use Azure Blob Storage which is a great option for storing your state file. It supports state locking natively to prevent race conditions in your team. Make sure to enable versioning on the storage container to be able to recover from accidental deletions or corruptions.
Please see my previous article on Terraform state in Azure Storage Container
Lock down your state files
Make sure access to the state files is as strict as possible. I use service principals and GitHub Secrets to access my state files. For more security, please consider using SAS tokens with limited permissions and expiration time. Another option is to use OIDC with workload identity federation to access your state files. They are more cumbersome, but more secure in real-world production environments.
Use Prevent Destroy for critical resources
You can use the lifecycle
block to prevent critical resources from being destroyed.
resource "azurerm_resource_group" "example" {
name = "example-resources"
location = "West Europe"
lifecycle {
prevent_destroy = true
}
}
Please see Hashicorp Doc - Resource Lifecycle
Format your code properly
Use terraform fmt
to format your code properly. This will make it easier to read
and maintain. You can also use a pre-commit hook to automatically format your code
before committing. CI/CD pipelines supports enforcing this as well.
Use a linter
In a team, it is important to have a consistent coding style. You can use a linter to enforce a consistent coding style. The most popular linter is tflint.
Validate your code
Use terraform validate
to validate your code. This will ensure that your code is
syntactically correct and that all required variables are defined. However, it cannot
catch all errors related to the state or your cloud provider configuration.
Always run terraform plan
before terraform apply
Terraform plan will show you what changes will be made to your infrastructure before actually applying them. This is a crucial step in the process and should never be skipped. If your refactoring will destroy and recreate resources, you will see it in the plan output.
Move resources in the state file when refactoring
If your terraform plan
shows that resources will be destroyed and recreated, you need to
move the resources in the state file. The "old school" approach is to use terraform state mv
to move resources in the state file. The new way is to use the moved
block in your configuration files.
This will tell Terraform that the resource has been moved and it will update the state file
accordingly. Here is an example of how to use the moved
block:
moved {
from = azurerm_resource_group.old_name
to = module.resource_groups.azurerm_resource_group.new_name
}
CI/CD pipeline setup - our hero
For production-ready IaC, you should use code versioning repositories and CI/CD pipelines. This will ensure that your code is always in a known state and that changes are reviewed before being applied. CI/CD pipelines also make sure your environment is built in the correct order and that dependencies are handled properly when you have multiple state files.
Protect your main branch
Make sure to lock down your main branch so only code that has been reviewed and approved in a Pull Request can be merged. This will ensure that all changes are reviewed before being applied to your infrastructure.
Edit your infrastructure in a feature branch
Create a new branch for each feature or bug fix. This will allow you to work on multiple features
at the same time without affecting the main branch. Traceability is important in a team, so make sure to
use descriptive branch names and commit messages. When commits are pushed to the feature branch, the CI/CD
pipeline should run terraform fmt
, terraform validate
and optionally tflint
to ensure that the code is
properly formatted and valid.
Use Pull Requests for code reviews and running terraform plan
When the feature is complete, create a Pull Request to merge the feature branch into the main branch.
The Pull Request should be reviewed by at least one other person in the team. The CI/CD pipeline should run
terraform plan
to show what changes will be made to the infrastructure. The Pull Request should not be merged
until the plan has been reviewed and approved.
These GitHub Action steps are running when a Pull Request is submitted:
name: "Pull Request on Dev environment"
on:
pull_request:
branches:
- main
paths:
- "dev/**"
env:
TF_LOG: INFO
TF_INPUT: false
permissions:
id-token: write
issues: write
pull-requests: write
contents: read
jobs:
pr-infra-check:
runs-on: ubuntu-latest
defaults:
run:
shell: bash
working-directory: ./dev
env:
ARM_CLIENT_ID: ${{ secrets.ARM_CLIENT_ID }}
ARM_CLIENT_SECRET: ${{ secrets.ARM_CLIENT_SECRET }}
ARM_SUBSCRIPTION_ID: ${{ secrets.ARM_SUBSCRIPTION_ID }}
ARM_TENANT_ID: ${{ secrets.ARM_TENANT_ID }}
steps:
# Checkout the repository to the GitHub Actions runner
- name: Checkout
uses: actions/checkout@v4
# Install the latest version of Terraform CLI
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
# Run Terraform init
- name: Terraform Init
id: init
run: terraform init -backend-config="dev.tfbackend"
# Run a Terraform fmt
- name: Terraform format
id: fmt
run: terraform fmt -check
# Run a Terraform validate
- name: Terraform validate
id: validate
if: success() || failure()
run: terraform validate -no-color
# Run a Terraform plan
- name: Terraform plan
id: plan
run: terraform plan -no-color
This GitHub Action step borrowed from Ned In The Cloud adds a comment to the Pull Request with the output of terraform plan
:
# Add a comment to Pull Requests with plan results weeeeeee
- name: Add Plan Comment
id: comment
uses: actions/github-script@v6
env:
PLAN: "terraform\n${{ steps.plan.outputs.stdout }}"
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
const output = `#### Terraform Format and Style 🖌\`${{ steps.fmt.outcome }}\`
#### Terraform Initialization ⚙️\`${{ steps.init.outcome }}\`
#### Terraform Validation 🤖${{ steps.validate.outputs.stdout }}
#### Terraform Plan 📖\`${{ steps.plan.outcome }}\`
<details><summary>Show Plan</summary>
\`\`\`${process.env.PLAN}\`\`\`
</details>
*Pusher: @${{ github.actor }}, Action: \`${{ github.event_name }}\`, Working Directory: \`${{ env.tf_actions_working_dir }}\`, Workflow: \`${{ github.workflow }}\`*`;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: output
})
Changes in your environment are applied on Pull Request Merge
When the Pull Request is reviewed by stakeholders, it is time to trigger your infrastructure changes doing a Pull Request Merge. Make sure you have strict branch protection rules to keep these powers properly harnessed.
Here is my GitHub Action for Pull Request Merge. The highlighted line makes sure only changes in the dev environment are included:
name: "Push on Dev environment"
on:
push:
paths:
- "dev/**"
env:
TF_LOG: INFO
TF_INPUT: false
permissions:
id-token: write
contents: read
jobs:
terraform:
name: "Terraform Push"
runs-on: ubuntu-latest
# Use the Bash shell
defaults:
run:
shell: bash
working-directory: ./dev
env:
ARM_CLIENT_ID: ${{ secrets.ARM_CLIENT_ID }}
ARM_CLIENT_SECRET: ${{ secrets.ARM_CLIENT_SECRET }}
ARM_SUBSCRIPTION_ID: ${{ secrets.ARM_SUBSCRIPTION_ID }}
ARM_TENANT_ID: ${{ secrets.ARM_TENANT_ID }}
steps:
# Checkout the repository to the GitHub Actions runner
- name: Checkout
uses: actions/checkout@v4
# Install the preferred version of Terraform CLI
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
# Run Terraform init for Main branch
- name: Terraform Init for Main branch
id: init-main
if: github.ref == 'refs/heads/main'
run: terraform init -backend-config="dev.tfbackend"
# Run Terraform apply if branch is main
- name: Terraform Apply for Main branch
if: github.ref == 'refs/heads/main'
id: apply
run: terraform apply -auto-approve
# Run formatting and validate if the branch is not main
- name: Terraform Init for other branches
id: init
if: github.ref != 'refs/heads/main'
run: terraform init -backend=false
# Run a Terraform format
- name: Terraform format for other branches
if: github.ref != 'refs/heads/main'
id: fmt
run: terraform fmt -check
# Run a Terraform validate
- name: Terraform validate for other branches
id: validate
if: (success() || failure()) && github.ref != 'refs/heads/main'
run: terraform validate -no-color
I have similar YAML files for Staging and Prod, keeping their states separate.