Terraform for Network Engineers
Most Terraform tutorials assume you're deploying web servers. But the tool's real superpower for a network engineer is different: your network's intended state, in files, in git, with a diff before every change. No more "who changed the route table," no more snowflake VPCs, no more clicking through three consoles to stand up a branch.
This is a practical tour: core concepts, network-relevant providers, and the patterns that keep you out of trouble.
The Mental Model
Terraform is declarative. You describe what should exist; Terraform figures out the create/update/delete operations by comparing three things:
- Configuration — your
.tffiles (desired state) - State — Terraform's record of what it created (
terraform.tfstate) - Reality — what the provider API reports right now
The workflow never changes:
terraform init # download providers, configure backend
terraform plan # show the diff — READ THIS
terraform apply # execute it
plan is the killer feature for network work. It's a change-control document generated by machine: "this will destroy and recreate the subnet" is the kind of sentence you want to read before Friday at 5pm.
Example: A Real VPC, Not a Toy
terraform {
required_providers {
aws = { source = "hashicorp/aws", version = "~> 5.0" }
}
backend "s3" {
bucket = "acme-tfstate"
key = "network/prod/vpc.tfstate"
region = "us-west-2"
dynamodb_table = "tf-locks"
encrypt = true
}
}
resource "aws_vpc" "main" {
cidr_block = "10.20.0.0/16"
enable_dns_support = true
enable_dns_hostnames = true
tags = { Name = "prod-vpc", ManagedBy = "terraform" }
}
resource "aws_subnet" "private" {
for_each = { a = "10.20.1.0/24", b = "10.20.2.0/24" }
vpc_id = aws_vpc.main.id
availability_zone = "us-west-2${each.key}"
cidr_block = each.value
tags = { Name = "prod-private-${each.key}", Tier = "private" }
}
resource "aws_flow_log" "vpc" {
vpc_id = aws_vpc.main.id
traffic_type = "ALL"
log_destination = aws_s3_bucket.flow_logs.arn
log_destination_type = "s3"
}
Notes on what's deliberate here:
- Remote state with locking. Local state on a laptop is how two engineers apply conflicting changes. S3 + DynamoDB locking (or Terraform Cloud) is table stakes.
for_eachover copy-paste. Subnets, route tables, and TGW attachments are repetitive;for_eachkeeps one definition and N instances.- Flow logs in the same module as the VPC. Observability is part of the network definition, not an afterthought — the same philosophy as wiring New Relic collectors into every site.
It's Not Just Cloud: Meraki, Mist, Junos
The part network engineers miss: Terraform providers exist for the platforms you already run.
- Meraki — the
cisco-open/official Meraki provider manages organizations, networks, VLANs, SSIDs, firewall rules, SD-WAN settings. Branch-in-a-module: onemodule "branch"block per site, variables for the subnet plan, and your 40 branches are guaranteed-identical. - Juniper Mist — a provider covers orgs, sites, WLANs, and switch templates, mirroring the API-first design of the platform.
- Junos directly — providers exist for pushing config to individual devices, though honestly this is where Ansible often fits better (see below).
- DNS, IPAM, certificates — Cloudflare/Route 53, NetBox, ACM. Your entire service edge can be one repo.
resource "meraki_networks_appliance_vlans" "corp" {
network_id = module.branch_denver.network_id
vlan_id = "10"
name = "CORP"
subnet = "10.31.10.0/24"
appliance_ip = "10.31.10.1"
}
Modules: Your Site Design as a Product
Once two sites share a design, extract a module:
modules/
branch/ # takes site_name, cidr_base, hub_ids...
main.tf # meraki network, VLANs, SD-WAN policy, alerts
variables.tf
outputs.tf
envs/
prod/
denver.tf # module "branch" { source = "../../modules/branch" ... }
austin.tf
The module is your network standard. Design review happens once, on the module; site turn-ups become a five-line PR. When the standard changes (new VLAN, new alert webhook), you bump the module version and roll it out site by site — with a plan diff at every step.
Terraform vs. Ansible
The perennial question. The clean split:
| Terraform | Ansible | |
|---|---|---|
| Model | Declarative, stateful | Procedural, (mostly) stateless |
| Sweet spot | Things with CRUD APIs: cloud, Meraki, Mist, DNS | Device CLIs/NETCONF: Junos, IOS, NX-OS; OS config; orchestrated change sequences |
| Deletion | Tracks and removes what you delete from code | Doesn't know what it's no longer managing |
| Drift | terraform plan shows it | Needs check-mode runs to spot it |
They compose well: Terraform builds the VPC and the Meraki org; Ansible pushes the Junos config on the boxes the cloud doesn't own. I cover the other half in my Ansible network automation article.
Guardrails Learned the Hard Way
- Never
applyfrom a laptop against prod. CI runsplanon the PR, posts the diff for review, applies on merge. GitHub Actions + OIDC to AWS means no long-lived credentials either. prevent_destroyon the load-bearing stuff. A refactor that renames a resource looks like destroy-and-create to Terraform.lifecycle { prevent_destroy = true }on VPCs, TGWs, and anything with an IP your users know.- Import the brownfield, don't ignore it.
terraform import(andimportblocks in modern versions) bring existing resources under management. Unmanaged-but-related resources are landmines for future refactors. - Small states. One giant state file for the whole company means every plan takes ten minutes and every mistake has a blast radius of everything. Split by environment and domain (
network/prod,network/dev,dns/). - Pin versions. Providers ship breaking changes;
~>constraints and a lockfile keep Friday deploys boring.
Where to Start
Don't boil the ocean. Pick the next new thing you'd have built by hand — one VPC, one branch network — and build it in Terraform instead. Keep the plan output in the change ticket. Within a quarter you'll have modules, and the click-ops version of you will look as dated as the guy pasting configs from Notepad.