The 10 Commandments of CI/CD
These are the foundational rules that separate amateur pipelines from production-grade ones. Follow these, and your CI/CD will be reliable, fast, and secure.
1. Build once, deploy everywhere
2. Everything in code (pipeline as code)
3. Keep pipelines fast (under 10 minutes)
4. Fail fast, fail loudly
5. Never skip security
6. Immutable artifacts only
7. Automate everything (except approval)
8. Monitor deployments actively
9. Practice rollbacks regularly
10. Secure the pipeline itself
1. Build Once, Deploy Everywhere
The single most important rule of CI/CD. Never rebuild your application for different environments. Build once, then deploy the same artifact to staging, pre-production, and production.
❌ Wrong: Build Per Environment
# BAD: Building separately for each environment
deploy-staging:
steps:
- run: npm run build # Build #1 for staging
- run: docker build -t myapp:staging .
- run: kubectl apply ...
deploy-production:
steps:
- run: npm run build # Build #2 for production ← DIFFERENT BINARY!
- run: docker build -t myapp:production .
- run: kubectl apply ...Problem: Build #1 and Build #2 might produce different results due to timing, dependency updates, or environment differences.
✅ Correct: Build Once, Deploy Same Artifact
# GOOD: Build once, deploy the exact same artifact everywhere
build:
steps:
- run: npm run build
- run: docker build -t myapp:${{ github.sha }} .
- run: docker push myapp:${{ github.sha }}
deploy-staging:
needs: build
steps:
- run: kubectl set image deployment/myapp myapp=myapp:${{ github.sha }} -n staging
deploy-production:
needs: deploy-staging
steps:
- run: kubectl set image deployment/myapp myapp=myapp:${{ github.sha }} -n production
# ← EXACT same image as staging ✅Environment-Specific Configuration
Use environment variables or ConfigMaps/Secrets for per-environment configuration — never bake configuration into the artifact:
# Kubernetes ConfigMap per environment
apiVersion: v1
kind: ConfigMap
metadata:
name: myapp-config
namespace: staging # or production
data:
DATABASE_URL: "postgres://staging-db:5432/myapp"
LOG_LEVEL: "debug"
API_URL: "https://staging-api.myapp.com"
---
# Production uses different values, same image
apiVersion: v1
kind: ConfigMap
metadata:
name: myapp-config
namespace: production
data:
DATABASE_URL: "postgres://prod-db:5432/myapp"
LOG_LEVEL: "warn"
API_URL: "https://api.myapp.com"2. Pipeline as Code
Your pipeline definition must live in version control, alongside the application code. Never configure pipelines through a web UI.
Why Pipeline as Code?
| Click-Based (UI) | Pipeline as Code |
|---|---|
| Not reproducible | Fully reproducible |
| No history/audit trail | Git history shows all changes |
| Can't code-review | Pull requests for pipeline changes |
| One person knows the config | Team-visible and documented |
| Disaster = rebuild from memory | Disaster = git clone + done |
File Convention by Tool
GitHub Actions: .github/workflows/ci.yml
GitLab CI: .gitlab-ci.yml
Jenkins: Jenkinsfile
CircleCI: .circleci/config.yml
Azure Pipelines: azure-pipelines.yml
Organized Workflow Structure
.github/
└── workflows/
├── ci.yml # Lint, build, test (on every PR)
├── cd-staging.yml # Deploy to staging (on develop merge)
├── cd-production.yml # Deploy to production (on main merge)
├── security.yml # Nightly security scans
├── cleanup.yml # Weekly cleanup of old images
└── dependabot.yml # Dependency updates
3. Keep Pipelines Fast
Slow pipelines kill productivity. If developers wait 30 minutes for a pipeline, they context-switch, lose focus, and batch changes — leading to bigger, riskier deployments.
Target Timelines
| Pipeline Stage | Target | Action if Exceeded |
|---|---|---|
| Lint | < 30 sec | Fewer rules or parallel linters |
| Build | < 2 min | Caching, multi-stage Docker builds |
| Unit tests | < 2 min | Parallel test suites |
| Integration tests | < 5 min | Tests run in parallel |
| Full pipeline | < 10 min | Optimize or parallelize |
| Deploy | < 2 min | Pre-pull images, rolling updates |
Speed Optimization Techniques
Cache Dependencies
# GitHub Actions — npm cache
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm' # Caches ~/.npm automatically
# Docker layer caching
- uses: docker/build-push-action@v5
with:
cache-from: type=gha
cache-to: type=gha,mode=maxParallelize Independent Jobs
# Run independent jobs simultaneously
jobs:
lint: # ┐
... # ├── These 3 run in PARALLEL
test-unit: # │
... # │
security: # ┘
...
deploy:
needs: [lint, test-unit, security] # Wait for all 3Skip Unnecessary Work
on:
push:
paths:
- 'src/**' # Only run if source code changed
- 'package.json'
- 'Dockerfile'
paths-ignore:
- '**.md' # Skip on doc changes
- '.github/ISSUE_TEMPLATE/**'Cancel Redundant Runs
# Cancel previous pipeline if new commit pushed
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true4. Fail Fast, Fail Loudly
The cheapest checks should run first. If linting takes 15 seconds and catches a bug, don't waste 5 minutes building before discovering it.
Optimal Stage Order
Stage 1: Lint (15 sec) ← Cheapest check first
Stage 2: Type Check (20 sec) ← Static analysis
Stage 3: Unit Tests (45 sec) ← Fast tests
Stage 4: Build (90 sec) ← Compile/package
Stage 5: Integration (3 min) ← Heavier tests
Stage 6: Security Scan (2 min) ← Can run in parallel with Stage 5
Stage 7: E2E Tests (5 min) ← Slowest, most expensive
Stage 8: Deploy (1 min) ← Only if everything passes
Loud Notifications on Failure
# Always notify on failure
- name: Notify failure
if: failure()
run: |
curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
-H 'Content-Type: application/json' \
-d '{
"text": "❌ Pipeline FAILED",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "❌ *Pipeline Failed*\n*Repo:* ${{ github.repository }}\n*Branch:* ${{ github.ref_name }}\n*Commit:* `${{ github.sha }}`\n*Author:* ${{ github.actor }}\n*Message:* ${{ github.event.head_commit.message }}\n<${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|View Pipeline>"
}
}
]
}'Set Timeouts on Every Job
jobs:
build:
runs-on: ubuntu-latest
timeout-minutes: 15 # Kill if stuck for 15 min
test:
runs-on: ubuntu-latest
timeout-minutes: 10
deploy:
runs-on: ubuntu-latest
timeout-minutes: 55. Never Skip Security
Every pipeline must include security scanning. At minimum, scan your dependencies. Ideally, scan your code and containers too.
Minimum Security Pipeline
security:
runs-on: ubuntu-latest
steps:
# 1. Dependency vulnerabilities (MUST HAVE)
- run: npm audit --audit-level=high
# 2. Secret detection (MUST HAVE)
- uses: trufflesecurity/trufflehog@main
with:
extra_args: --only-verified
# 3. Static code analysis (RECOMMENDED)
- uses: github/codeql-action/init@v3
with:
languages: javascript-typescript
- uses: github/codeql-action/analyze@v3
# 4. Container scanning (IF using Docker)
- uses: aquasecurity/trivy-action@master
with:
image-ref: myapp:${{ github.sha }}
severity: 'CRITICAL,HIGH'
exit-code: '1'Security Scanning Schedule
| Scan Type | Frequency | Block Deploy? |
|---|---|---|
| npm audit | Every pipeline | Yes (high/critical) |
| Secret detection | Every pipeline | Yes |
| SAST (CodeQL) | Every PR | Yes (high severity) |
| Container scan | Every build | Yes (critical) |
| Full DAST scan | Nightly | No (creates tickets) |
| SBOM generation | Every release | No (compliance) |
6. Immutable Artifacts
Once an artifact is built and tagged, never overwrite it. If you need a fix, build a new artifact with a new tag.
❌ Wrong: Mutable Tags
# DON'T DO THIS
docker build -t myapp:latest .
docker push myapp:latest
# Next day...
docker build -t myapp:latest . # ← Overwrites yesterday's image!
docker push myapp:latest✅ Correct: Immutable Tags
# DO THIS
docker build -t myapp:abc123f . # Tagged with git SHA
docker push myapp:abc123f
# Next day...
docker build -t myapp:def456a . # New tag, old image preserved
docker push myapp:def456aWhy Immutability Matters
| Mutable | Immutable |
|---|---|
myapp:latest could be anything | myapp:abc123f is always the same |
| Can't reproduce a deployment | Can always redeploy exact version |
| Rollback might not work | Rollback always works |
| "What version is running?" → "¯\(ツ)/¯" | "What version is running?" → "abc123f" |
7. Automate Everything (Except Approval)
The only human step in your pipeline should be the approval gate before production deployment. Everything else must be automated.
What to Automate
✅ Code linting
✅ Building artifacts
✅ Running all tests
✅ Security scanning
✅ Deploying to staging
✅ Smoke testing staging
✅ Creating release notes
✅ Tagging releases
✅ Notifying team
✅ Monitoring deployment health
✅ Rolling back on failure
What to Keep Manual
👤 Approval to deploy to production
👤 Decision to rollback (unless auto-rollback configured)
👤 Major version releases (marketing coordination)
Automated Release Notes
release:
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Generate changelog
id: changelog
run: |
PREVIOUS_TAG=$(git describe --tags --abbrev=0 HEAD^ 2>/dev/null || echo "")
if [ -z "$PREVIOUS_TAG" ]; then
CHANGES=$(git log --pretty=format:"- %s (%h)" -20)
else
CHANGES=$(git log $PREVIOUS_TAG..HEAD --pretty=format:"- %s (%h)")
fi
echo "changes<<EOF" >> $GITHUB_OUTPUT
echo "$CHANGES" >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT
- name: Create GitHub Release
uses: actions/create-release@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
tag_name: v${{ github.run_number }}
release_name: Release v${{ github.run_number }}
body: |
## Changes
${{ steps.changelog.outputs.changes }}
## Deployment Info
- **Commit:** ${{ github.sha }}
- **Date:** ${{ github.event.head_commit.timestamp }}
- **Author:** ${{ github.actor }}8. Monitor Deployments Actively
Deploying is not the end — it's the beginning of monitoring. Your pipeline should include post-deployment health checks and alerting.
Post-Deployment Health Checks
post-deploy-monitor:
name: Post-Deployment Monitoring
runs-on: ubuntu-latest
needs: deploy-production
steps:
- name: Wait for app to stabilize
run: sleep 60
- name: Check application health
run: |
for i in $(seq 1 5); do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://myapp.com/health)
if [ "$STATUS" != "200" ]; then
echo "❌ Health check failed (attempt $i/5): HTTP $STATUS"
if [ "$i" = "5" ]; then exit 1; fi
sleep 10
else
echo "✅ Health check passed: HTTP $STATUS"
break
fi
done
- name: Check error rate
run: |
# Query Prometheus/Datadog for error rate
ERROR_RATE=$(curl -s "https://monitoring.myapp.com/api/metrics" \
-H "Authorization: Bearer ${{ secrets.MONITORING_TOKEN }}" \
| jq -r '.error_rate')
echo "Error rate: ${ERROR_RATE}%"
if (( $(echo "$ERROR_RATE > 5" | bc -l) )); then
echo "❌ Error rate too high! Triggering rollback..."
exit 1
fi
- name: Check response latency
run: |
LATENCY=$(curl -s -o /dev/null -w "%{time_total}" https://myapp.com/api/status)
echo "Response time: ${LATENCY}s"
if (( $(echo "$LATENCY > 2.0" | bc -l) )); then
echo "⚠️ Latency elevated: ${LATENCY}s (threshold: 2.0s)"
fi
- name: Auto-rollback on failure
if: failure()
run: |
echo "🔄 Auto-rolling back production deployment..."
kubectl rollout undo deployment/myapp -n production
kubectl rollout status deployment/myapp -n production --timeout=5m
curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
-d '{"text":"🚨 PRODUCTION AUTO-ROLLBACK triggered after deployment of ${{ github.sha }}"}'Deployment Dashboard Metrics
| Metric | What to Watch | Alert Threshold |
|---|---|---|
| HTTP 5xx rate | Server errors | > 1% of requests |
| P99 latency | Slowest 1% of responses | > 2 seconds |
| Error count | Total errors per minute | > 10/min |
| CPU usage | Application load | > 80% for 5 min |
| Memory usage | Memory leaks | > 85% |
| Pod restarts | Crash loops | > 0 after deploy |
9. Practice Rollbacks Regularly
A rollback you've never tested is a rollback that won't work. Practice rolling back regularly so your team is prepared.
Rollback Methods by Strategy
| Strategy | Rollback Method | Time |
|---|---|---|
| Rolling Update | kubectl rollout undo | 30-60 sec |
| Blue-Green | Switch service selector | 5 sec |
| Canary | Scale canary to 0 | 10 sec |
| Feature Flag | Toggle flag OFF | Instant |
| Recreate | Redeploy old version | 2-5 min |
Rollback Pipeline
rollback:
name: Emergency Rollback
runs-on: ubuntu-latest
# Manual trigger with version selection
on:
workflow_dispatch:
inputs:
target_version:
description: 'Version to rollback to (git SHA or tag)'
required: true
reason:
description: 'Reason for rollback'
required: true
steps:
- name: Log rollback
run: |
echo "🚨 ROLLBACK INITIATED"
echo "Target: ${{ github.event.inputs.target_version }}"
echo "Reason: ${{ github.event.inputs.reason }}"
echo "By: ${{ github.actor }}"
echo "Time: $(date -u)"
- name: Rollback deployment
run: |
kubectl set image deployment/myapp \
myapp=myregistry.io/myapp:${{ github.event.inputs.target_version }} \
-n production
kubectl rollout status deployment/myapp -n production --timeout=5m
- name: Verify rollback
run: |
curl -f https://myapp.com/health
echo "✅ Rollback successful"
- name: Notify team
run: |
curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
-d "{\"text\":\"🚨 ROLLBACK to ${{ github.event.inputs.target_version }}\nReason: ${{ github.event.inputs.reason }}\nBy: ${{ github.actor }}\"}"
- name: Create incident ticket
run: |
curl -X POST https://api.pagerduty.com/incidents \
-H "Authorization: Token token=${{ secrets.PAGERDUTY_TOKEN }}" \
-H "Content-Type: application/json" \
-d "{
\"incident\": {
\"title\": \"Production Rollback: ${{ github.event.inputs.reason }}\",
\"urgency\": \"high\"
}
}"Rollback Drill Schedule
Run a rollback drill monthly to verify your process works:
Monthly Rollback Drill Checklist:
☐ Deploy a known-good older version to production
☐ Verify monitoring detects the version change
☐ Verify alerts fire correctly
☐ Verify the application works after rollback
☐ Measure rollback time (target: under 5 minutes)
☐ Document any issues found
☐ Update runbook if process changed
10. Secure the Pipeline Itself
Your CI/CD pipeline has privileged access to production, registries, and cloud accounts. It's a prime attack target.
Pipeline Security Checklist
Secrets Management
# ✅ GOOD: Use GitHub Secrets
env:
API_KEY: ${{ secrets.API_KEY }}
# ❌ BAD: Hardcoded secrets
env:
API_KEY: "sk-1234567890abcdef"
# ❌ BAD: Secrets in logs
- run: echo "Key is ${{ secrets.API_KEY }}" # Visible in logs!Dependency Pinning
# ✅ GOOD: Pin actions to SHA (prevents supply chain attacks)
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
# ⚠️ OK: Pin to major version (gets security patches)
- uses: actions/checkout@v4
# ❌ BAD: No version pinning
- uses: actions/checkout@main # Could change at any time!Least Privilege Permissions
# Set minimum permissions for the workflow
permissions:
contents: read # Only read repo contents
packages: write # Write to container registry
id-token: write # For OIDC authentication to cloud
# Even more restrictive per job
jobs:
build:
permissions:
contents: read
deploy:
permissions:
contents: read
id-token: write # Only deploy job needs cloud accessBranch Protection
Repository Settings → Branch Protection Rules:
✅ Require pull request before merging
✅ Require status checks to pass (CI pipeline)
✅ Require code review approval (1-2 reviewers)
✅ Dismiss stale reviews on new pushes
✅ Require signed commits
✅ Do not allow bypassing above settings
✅ Restrict force pushes
Self-Hosted Runner Security
# If using self-hosted runners:
jobs:
build:
runs-on: [self-hosted, linux, x64]
# Security measures:
# ✅ Run in ephemeral containers (clean state per job)
# ✅ Network isolation (no access to other internal systems)
# ✅ Regularly update runner software
# ✅ Monitor runner activity
# ❌ Never run untrusted code on self-hosted runnersPipeline Anti-Patterns
Anti-Pattern 1: The Monolith Pipeline
# ❌ BAD: One massive job that does everything
jobs:
everything:
steps:
- run: npm ci
- run: npm run lint
- run: npm test
- run: npm run build
- run: docker build .
- run: docker push .
- run: kubectl apply .
# If ANY step fails, you restart EVERYTHING# ✅ GOOD: Separate jobs with dependencies
jobs:
lint: { ... }
build: { needs: lint, ... }
test: { needs: build, ... }
deploy: { needs: test, ... }
# If test fails, only re-run from testAnti-Pattern 2: Manual Steps Disguised as Automation
# ❌ BAD: "Run this script manually after pipeline completes"
# This defeats the purpose of CI/CD
# ✅ GOOD: Every step is in the pipeline
deploy:
steps:
- run: kubectl apply -f k8s/
- run: kubectl rollout status deployment/myapp
- run: bash scripts/smoke-test.sh
- run: bash scripts/notify-team.shAnti-Pattern 3: Ignoring Failures
# ❌ BAD: Ignoring security scan failures
- run: npm audit
continue-on-error: true # ← Why even scan?
# ✅ GOOD: Block on critical/high
- run: npm audit --audit-level=high
# Pipeline fails if high/critical vulnerabilities foundAnti-Pattern 4: No Cleanup
# ❌ BAD: Artifacts accumulate forever
# → $500/month in registry storage
# ✅ GOOD: Automated cleanup
cleanup:
schedule:
- cron: '0 3 * * SUN'
steps:
- uses: actions/delete-package-versions@v5
with:
min-versions-to-keep: 10
delete-only-untagged-versions: truePipeline Maturity Model
Level 1: Basic (Getting Started)
☐ Code in version control
☐ Pipeline runs on push
☐ Basic lint + unit tests
☐ Manual deployment
Level 2: Standard (Most Teams)
☐ All Level 1 items
☐ Integration tests
☐ Automated staging deployment
☐ Manual production approval
☐ Slack notifications
☐ Test coverage tracking
Level 3: Advanced (Strong DevOps Culture)
☐ All Level 2 items
☐ Security scanning (SAST, SCA, secrets)
☐ Container image scanning
☐ Canary or blue-green deployments
☐ Post-deploy monitoring
☐ Auto-rollback on failure
☐ SBOM generation
Level 4: Elite (Top 10% of Organizations)
☐ All Level 3 items
☐ Feature flags for gradual rollout
☐ Chaos engineering in pipeline
☐ Performance regression testing
☐ Automated rollback drills
☐ Full GitOps workflow
☐ < 15% change failure rate
☐ < 1 hour lead time for changes
Quick Reference: Pipeline Template
# The "Golden Pipeline" — copy and customize
name: CI/CD Pipeline
on:
push:
branches: [main, develop]
paths-ignore: ['**.md']
pull_request:
branches: [main]
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
permissions:
contents: read
packages: write
jobs:
lint:
runs-on: ubuntu-latest
timeout-minutes: 5
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: '20', cache: 'npm' }
- run: npm ci
- run: npm run lint
- run: npx tsc --noEmit
test:
needs: lint
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: '20', cache: 'npm' }
- run: npm ci
- run: npm test -- --coverage --ci
- uses: codecov/codecov-action@v4
security:
needs: lint
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- uses: actions/checkout@v4
- run: npm audit --audit-level=high
- uses: github/codeql-action/init@v3
with: { languages: javascript-typescript }
- uses: github/codeql-action/analyze@v3
build:
needs: [test, security]
runs-on: ubuntu-latest
timeout-minutes: 15
if: github.ref == 'refs/heads/main' || github.ref == 'refs/heads/develop'
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v3
- uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- uses: docker/build-push-action@v5
with:
push: true
tags: ghcr.io/${{ github.repository }}:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
deploy-staging:
needs: build
if: github.ref == 'refs/heads/develop'
runs-on: ubuntu-latest
timeout-minutes: 5
environment: staging
steps:
- run: echo "Deploy to staging"
# kubectl set image ...
deploy-production:
needs: build
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
timeout-minutes: 5
environment: production
steps:
- run: echo "Deploy to production"
# kubectl set image ...
notify:
needs: [deploy-staging, deploy-production]
if: always()
runs-on: ubuntu-latest
steps:
- run: |
curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
-d '{"text":"Pipeline complete: ${{ needs.deploy-production.result || needs.deploy-staging.result }}"}'