Foundational Practices
1. Remote State is Mandatory
DON'T: Store state locally in production
# ❌ BAD: Local state only
# terraform.tfstate in working directoryDO: Use remote, shared backends
# ✅ GOOD: Remote S3 backend
terraform {
backend "s3" {
bucket = "company-terraform-state"
key = "prod/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}Why: Team collaboration, locking, backup, security, audit trail
2. Version Everything
DON'T: Use latest versions
# ❌ BAD: Unpinned versions
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
}
}
}DO: Pin major and minor versions
# ✅ GOOD: Locked versions
terraform {
required_version = "~> 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.20"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.23"
}
}
}Why: Prevents breaking changes in production
3. Separate Concerns
DON'T: Build everything in one folder
Networking/ + Compute/ + Database/ = terraform/
(Monolithic root module - HARD TO MANAGE)
DO: Organize by logical layers
terraform/
├─ core/ # Networking, VPC, subnets
├─ database/ # RDS, data layer
├─ compute/ # EC2, load balancers
└─ monitoring/ # CloudWatch, alerts
Why: Easy to modify, test, and deploy independently
4. Use Meaningful Naming
| Item | Naming | Example |
|---|---|---|
| Resources | descriptive_type | aws_security_group_web |
| Variables | snake_case | instance_type |
| Outputs | noun_describing_value | alb_dns_name |
| Local names | use_type | security_group_web |
| Modules | domain_functionality | networking_vpc |
Code Organization
5. Follow Standard File Structure
Recommended layout:
project/
├─ main.tf # Primary resources
├─ variables.tf # Variable declarations
├─ outputs.tf # Output definitions
├─ locals.tf # Local values
├─ data.tf # Data sources
├─ terraform.tf # Provider and backend
├─ locals_override.tf # Local overrides (git-ignored)
└─ module/ # Reusable modules
└─ vpc/
├─ main.tf
├─ variables.tf
└─ outputs.tf
Why: Team members immediately know where to find things
6. Use Modules for Reusability
DON'T: Copy-paste resource definitions
# ❌ BAD: Duplicated in dev, staging, prod
resource "aws_security_group" "web_dev" {
# 50 lines of config
}
resource "aws_security_group" "web_staging" {
# 50 lines identical config
}
resource "aws_security_group" "web_prod" {
# 50 lines identical config
}DO: Create a reusable module
# ✅ GOOD: Single module for all environments
module "web_sg" {
source = "./modules/security_group"
for_each = toset(["dev", "staging", "prod"])
environment = each.value
vpc_id = var.vpc_ids[each.value]
ingress_ports = var.ingress_ports
}7. Document Everything
variable "instance_type" {
description = "EC2 instance type for web servers"
type = string
default = "t3.micro"
# Document why this default exists
# and what each type means for performance/cost
}
output "alb_dns_name" {
description = "DNS name of Application Load Balancer"
value = aws_lb.main.dns_name
# Explain what this URL is used for
# and how to access the application
}Input Validation & Safety
8. Use Variable Validation
variable "environment" {
type = string
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be dev, staging, or prod."
}
}
variable "instance_count" {
type = number
validation {
condition = var.instance_count >= 1 && var.instance_count <= 100
error_message = "Instance count must be between 1 and 100."
}
}Why: Catch errors early before infrastructure changes
9. Use Sensitive Data Properly
# ❌ WRONG: Hardcoded secrets
variable "db_password" {
type = string
default = "mypassword123" # NEVER EVER
}
# ✅ RIGHT: Use AWS Secrets Manager
data "aws_secretsmanager_secret_version" "db" {
secret_id = "rds/prod/password"
}
resource "aws_db_instance" "main" {
password = data.aws_secretsmanager_secret_version.db.secret_string
}Why: Never expose secrets in code or logs
10. Use -lock and terraform state lock
# Prevent others from modifying during apply
terraform apply -lock=true
# Fail immediately if someone else has lock (in automation)
terraform apply -lock-timeout=30sDeployment Practices
11. Always Run Plan First
# Read-only operation shows what will change
terraform plan -out=tfplan
# Review changes carefully
# Then apply only approved changes
terraform apply tfplanWhy: Prevents surprises in production
12. Use Workspaces for Environments
DON'T: Separate folders
cd terraform-dev && terraform apply # Dev environment
cd ../terraform-staging && terraform apply # Staging
cd ../terraform-prod && terraform apply # ProductionDO: Use workspaces with single codebase
terraform workspace select dev && terraform apply
terraform workspace select staging && terraform apply
terraform workspace select prod && terraform apply
# Variables differ per workspace:
# terraform.dev.tfvars, terraform.staging.tfvars, terraform.prod.tfvars13. Automate with CI/CD
# .github/workflows/terraform.yml
name: Terraform
on: [pull_request, push]
jobs:
terraform:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: hashicorp/setup-terraform@v2
- run: terraform fmt -check
- run: terraform plan -out=tfplan
- name: Comment on PR
run: |
terraform show tfplan > /tmp/plan.txt
# Use GitHub API to post comment with planCode Quality
14. Format Code Consistently
# Auto-format all .tf files
terraform fmt -recursive
# Check if formatting is correct (for CI/CD)
terraform fmt -recursive -checkWhy: Consistency across team, easier reviews
15. Validate Syntax
# Check syntax without applying
terraform validate
# Good for CI/CD pipelines16. Use Linting Tools
# Install tflint
brew install tflint
# Check for issues
tflintCommon rules:
- Naming conventions
- Unused variables
- Deprecated attributes
- Best practice violations
17. Security Scanning
# Install tfsec
brew install tfsec
# Find security issues
tfsec .
# Example: Unencrypted S3, exposed RDSState Management
18. Protect State Files
# Enable encryption
terraform {
backend "s3" {
encrypt = true
kms_key_id = "arn:aws:kms:region:account:key/id"
}
}
# Block public access
aws s3api put-public-access-block \
--bucket terraform-state \
--public-access-block-configuration \
"BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"19. Regular State Backups
# Manual backup
terraform state pull > terraform.tfstate.backup
# Improve S3 with versioning
aws s3api put-bucket-versioning \
--bucket terraform-state \
--versioning-configuration Status=Enabled20. Never Edit State Directly
# ❌ WRONG
vi terraform.tfstate
# Then upload somehow
# ✅ RIGHT: Use Terraform
terraform state rm old_resource
terraform state mv old_name new_name
terraform import aws_instance.web i-1234567890abcdef0Why: State corruption = infrastructure corruption
Scaling Practices
21. Use Variables for Configuration
DON'T: Hardcode values
resource "aws_instance" "web" {
instance_type = "t2.large" # Hardcoded
availability_zone = "us-east-1a" # Hardcoded
}DO: Use variables
resource "aws_instance" "web" {
instance_type = var.instance_type
availability_zone = var.availability_zone
}22. Use Locals for Computed Values
locals {
common_tags = {
Environment = var.environment
ManagedBy = "Terraform"
CostCenter = var.cost_center
}
name_prefix = "${var.project_name}-${var.environment}"
}
resource "aws_instance" "web" {
tags = merge(
local.common_tags,
{ Name = "${local.name_prefix}-web-server" }
)
}23. Use count and for_each Wisely
# count: When number is known
resource "aws_instance" "web" {
count = var.instance_count
instance_type = var.instance_type
tags = { Name = "web-${count.index + 1}" }
}
# for_each: When using map with different configs
resource "aws_instance" "servers" {
for_each = var.server_configs
instance_type = each.value.type
availability_zone = each.value.az
tags = { Name = each.key }
}Performance & Costs
24. Use data sources to reference existing infrastructure
# DON'T hardcode IDs
# DO fetch them
data "aws_vpc" "main" {
tags = { Name = "production" }
}
resource "aws_subnet" "app" {
vpc_id = data.aws_vpc.main.id
}25. Avoid Common Mistakes
| Mistake | Problem | Solution |
|---|---|---|
| Modifying state directly | Inconsistency | Use Terraform operations |
| Not using remote state | No collaboration | Set up S3 + DynamoDB |
| Hardcoding values | Non-reusable | Extract to variables |
| No variable validation | Bad inputs | Add validation blocks |
| Ignoring plan output | Surprises | Always review apply |
| Not versioning providers | Breaking changes | Pin versions |
| Large root modules | Hard to manage | Use modules |
| Secrets in code | Security breach | Use AWS Secrets Manager |
Team Workflow
26. Pull Request Process
1. Developer creates feature branch
2. Makes Terraform changes
3. Runs: terraform fmt, validate, plan
4. Creates PR with plan output
5. Team reviews: terraform show
6. Approves if safe
7. Merge to main
8. CI/CD automatically applies to staging
9. Manual approval for production apply
27. Documentation as Code
# README.md in each module explains:
# - What this module does
# - Input variables
# - Outputs
# - Example usage
# - Common modifications
# Provided in every module directorySummary
| Priority | Practice | Impact |
|---|---|---|
| Critical | Remote state | Team safety |
| Critical | Version pinning | Stability |
| Critical | Validation | Error prevention |
| High | Modular structure | Maintainability |
| High | Documentation | Team productivity |
| Medium | Code formatting | Consistency |
| Medium | CI/CD automation | Reliability |
| Medium | Naming conventions | Code clarity |