Terraform and CICD - keeping IaC safe

Terraform is a powerful IaC tool, but it can also be dangerous if not managed properly. In this article, we will explore some best practices for using Terraform in a CI/CD pipeline to keep your infrastructure safe.


Introduction

"With great power comes great responsibility."

Terraform uses declarative language to define and provision infrastructure and stores the state of your infrastructure in a state file. When you define resources in your Terraform configuration files, Terraform compares the desired state with the current state of your infrastructure and makes the necessary changes to bring the two into alignment. This means that if you make a change to your configuration files, Terraform will automatically update your infrastructure to match the new configuration.

Into the danger zone

In some DevOps communities, Terraform has a reputation for being dangerous. The concerns are absolutely legitimate. If you are not careful, you can easily destroy your entire infrastructure with a single command. Here are some of the reasons why Terraform can be dangerous, I'll explain them in a horror story.

Horror story

Your infrastructure is growing, and more people are joining the team. You want to refactor your code to make it more readable and easier to maintain. You decide to move Azure Resource Groups into their own module. You create a new module and move the Resource Group definitions into the new module. You update the references to the Resource Groups in your main configuration files. You run terraform apply --auto-approve to apply the changes. Minutes later, you get a call from your colleague. "The Resource Groups are gone! All the resources in the Resource Groups are gone! What did you do?"

It sounds crazy, but this is a real story. This was the nature of the problem:

  • Our villain, The dreaded Terralith. Everything was in one big state file. The blast radius was huge if something goes south.
  • The resources were not moved in the state file. If Terraform cannot find the formerly declared resources in their original location in the state, they will be destroyed and new resources will be created.
  • Not following best practices. No code reviews. No terraform plan reviews. Just blindly running terraform apply --auto-approve in production.

Terraform Best Practices

Split your state files

Slay the Terralith! Split your state files into smaller, more manageable pieces. This will reduce the blast radius if something goes south. To prepare for CI/CD, you should use backend config files to point to different state files for different environments.

Example dev.tfbackend:

storage_account_name = "tfstate18373"
container_name = "tfstate"
key="HillestadTech/dev.tfstate"

Use a remote backend for your state file

As soon as you have more than one person working on your infrastructure, you should use a remote backend for your state file. This will ensure that everyone is working on the same state file and prevent conflicts. In Azure, you can use Azure Blob Storage which is a great option for storing your state file. It supports state locking natively to prevent race conditions in your team. Make sure to enable versioning on the storage container to be able to recover from accidental deletions or corruptions.

Please see my previous article on Terraform state in Azure Storage Container

Lock down your state files

Make sure access to the state files is as strict as possible. I use service principals and GitHub Secrets to access my state files. For more security, please consider using SAS tokens with limited permissions and expiration time. Another option is to use OIDC with workload identity federation to access your state files. They are more cumbersome, but more secure in real-world production environments.

Use Prevent Destroy for critical resources

You can use the lifecycle block to prevent critical resources from being destroyed.

resource "azurerm_resource_group" "example" {
  name     = "example-resources"
  location = "West Europe"
 
  lifecycle {
    prevent_destroy = true
  }
}
 

Please see Hashicorp Doc - Resource Lifecycle

Format your code properly

Use terraform fmt to format your code properly. This will make it easier to read and maintain. You can also use a pre-commit hook to automatically format your code before committing. CI/CD pipelines supports enforcing this as well.

Use a linter

In a team, it is important to have a consistent coding style. You can use a linter to enforce a consistent coding style. The most popular linter is tflint.

Validate your code

Use terraform validate to validate your code. This will ensure that your code is syntactically correct and that all required variables are defined. However, it cannot catch all errors related to the state or your cloud provider configuration.

Always run terraform plan before terraform apply

Terraform plan will show you what changes will be made to your infrastructure before actually applying them. This is a crucial step in the process and should never be skipped. If your refactoring will destroy and recreate resources, you will see it in the plan output.

Move resources in the state file when refactoring

If your terraform plan shows that resources will be destroyed and recreated, you need to move the resources in the state file. The "old school" approach is to use terraform state mv to move resources in the state file. The new way is to use the moved block in your configuration files. This will tell Terraform that the resource has been moved and it will update the state file accordingly. Here is an example of how to use the moved block:

moved {
  from = azurerm_resource_group.old_name
  to   = module.resource_groups.azurerm_resource_group.new_name
}

CI/CD pipeline setup - our hero

For production-ready IaC, you should use code versioning repositories and CI/CD pipelines. This will ensure that your code is always in a known state and that changes are reviewed before being applied. CI/CD pipelines also make sure your environment is built in the correct order and that dependencies are handled properly when you have multiple state files.

Protect your main branch

Make sure to lock down your main branch so only code that has been reviewed and approved in a Pull Request can be merged. This will ensure that all changes are reviewed before being applied to your infrastructure.

Edit your infrastructure in a feature branch

Create a new branch for each feature or bug fix. This will allow you to work on multiple features at the same time without affecting the main branch. Traceability is important in a team, so make sure to use descriptive branch names and commit messages. When commits are pushed to the feature branch, the CI/CD pipeline should run terraform fmt, terraform validate and optionally tflint to ensure that the code is properly formatted and valid.

Use Pull Requests for code reviews and running terraform plan

When the feature is complete, create a Pull Request to merge the feature branch into the main branch. The Pull Request should be reviewed by at least one other person in the team. The CI/CD pipeline should run terraform plan to show what changes will be made to the infrastructure. The Pull Request should not be merged until the plan has been reviewed and approved.

These GitHub Action steps are running when a Pull Request is submitted:

name: "Pull Request on Dev environment"
 
on:
  pull_request:
    branches:
      - main
    paths:
      - "dev/**"
 
env:
  TF_LOG: INFO
  TF_INPUT: false
 
permissions:
  id-token: write
  issues: write
  pull-requests: write
  contents: read
 
jobs:
  pr-infra-check:
    runs-on: ubuntu-latest
 
    defaults:
      run:
        shell: bash
        working-directory: ./dev
 
    env:
      ARM_CLIENT_ID: ${{ secrets.ARM_CLIENT_ID }}
      ARM_CLIENT_SECRET: ${{ secrets.ARM_CLIENT_SECRET }}
      ARM_SUBSCRIPTION_ID: ${{ secrets.ARM_SUBSCRIPTION_ID }}
      ARM_TENANT_ID: ${{ secrets.ARM_TENANT_ID }}
 
    steps:
      # Checkout the repository to the GitHub Actions runner
      - name: Checkout
        uses: actions/checkout@v4
 
      # Install the latest version of Terraform CLI
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
 
      # Run Terraform init
      - name: Terraform Init
        id: init
        run: terraform init -backend-config="dev.tfbackend"
 
      # Run a Terraform fmt
      - name: Terraform format
        id: fmt
        run: terraform fmt -check
 
      # Run a Terraform validate
      - name: Terraform validate
        id: validate
        if: success() || failure()
        run: terraform validate -no-color
 
      # Run a Terraform plan
      - name: Terraform plan
        id: plan
        run: terraform plan -no-color

This GitHub Action step borrowed from Ned In The Cloud adds a comment to the Pull Request with the output of terraform plan:

# Add a comment to Pull Requests with plan results weeeeeee
- name: Add Plan Comment
  id: comment
  uses: actions/github-script@v6
  env:
    PLAN: "terraform\n${{ steps.plan.outputs.stdout }}"
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}
    script: |
      const output = `#### Terraform Format and Style 🖌\`${{ steps.fmt.outcome }}\`
      #### Terraform Initialization ⚙️\`${{ steps.init.outcome }}\`
      #### Terraform Validation 🤖${{ steps.validate.outputs.stdout }}
      #### Terraform Plan 📖\`${{ steps.plan.outcome }}\`
 
      <details><summary>Show Plan</summary>
 
      \`\`\`${process.env.PLAN}\`\`\`
 
      </details>
 
      *Pusher: @${{ github.actor }}, Action: \`${{ github.event_name }}\`, Working Directory: \`${{ env.tf_actions_working_dir }}\`, Workflow: \`${{ github.workflow }}\`*`;
 
      github.rest.issues.createComment({
        issue_number: context.issue.number,
        owner: context.repo.owner,
        repo: context.repo.repo,
        body: output
      })

Changes in your environment are applied on Pull Request Merge

When the Pull Request is reviewed by stakeholders, it is time to trigger your infrastructure changes doing a Pull Request Merge. Make sure you have strict branch protection rules to keep these powers properly harnessed.

Here is my GitHub Action for Pull Request Merge. The highlighted line makes sure only changes in the dev environment are included:

name: "Push on Dev environment"
 
on:
  push:
    paths:
      - "dev/**"
 
env:
  TF_LOG: INFO
  TF_INPUT: false
 
permissions:
  id-token: write
  contents: read
 
jobs:
  terraform:
    name: "Terraform Push"
    runs-on: ubuntu-latest
 
    # Use the Bash shell
    defaults:
      run:
        shell: bash
        working-directory: ./dev
 
    env:
      ARM_CLIENT_ID: ${{ secrets.ARM_CLIENT_ID }}
      ARM_CLIENT_SECRET: ${{ secrets.ARM_CLIENT_SECRET }}
      ARM_SUBSCRIPTION_ID: ${{ secrets.ARM_SUBSCRIPTION_ID }}
      ARM_TENANT_ID: ${{ secrets.ARM_TENANT_ID }}
 
    steps:
      # Checkout the repository to the GitHub Actions runner
      - name: Checkout
        uses: actions/checkout@v4
 
      # Install the preferred version of Terraform CLI
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
 
      # Run Terraform init for Main branch
      - name: Terraform Init for Main branch
        id: init-main
        if: github.ref == 'refs/heads/main'
        run: terraform init -backend-config="dev.tfbackend"
 
      # Run Terraform apply if branch is main
      - name: Terraform Apply for Main branch
        if: github.ref == 'refs/heads/main'
        id: apply
        run: terraform apply -auto-approve
 
      # Run formatting and validate if the branch is not main
      - name: Terraform Init for other branches
        id: init
        if: github.ref != 'refs/heads/main'
        run: terraform init -backend=false
 
      # Run a Terraform format
      - name: Terraform format for other branches
        if: github.ref != 'refs/heads/main'
        id: fmt
        run: terraform fmt -check
 
      # Run a Terraform validate
      - name: Terraform validate for other branches
        id: validate
        if: (success() || failure()) && github.ref != 'refs/heads/main'
        run: terraform validate -no-color

I have similar YAML files for Staging and Prod, keeping their states separate.