G
GuideDevOps
Lesson 9 of 11

Deployment Strategies

Part of the CI/CD Pipelines tutorial series.

Why Deployment Strategy Matters

How you deploy is as important as what you deploy. The wrong strategy can take your entire application offline. The right one lets you ship with zero downtime and instant rollback.

Bad Deployment:
  Friday 5 PM → Push new version → CRASH → All users affected
  → 3 hours to rollback → Lost revenue → Angry customers

Good Deployment:
  Tuesday 10 AM → Canary to 5% → Monitor 30 min → All green
  → Roll to 25% → Monitor → Roll to 100% → Zero downtime ✅

Strategy 1: Recreate (Big Bang)

Stop everything, deploy new version, start everything. The simplest but most dangerous approach.

How It Works

Time 0:  v1 ████████████ (serving 100% traffic)
         ↓ STOP ALL
Time 1:  ______________ (DOWNTIME - no traffic served)
         ↓ START NEW
Time 2:  v2 ████████████ (serving 100% traffic)

Kubernetes Implementation

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 3
  strategy:
    type: Recreate    # ← Kill all old pods, then create new ones
  template:
    spec:
      containers:
        - name: myapp
          image: myapp:v2

Pipeline Step

deploy-recreate:
  name: Recreate Deployment
  runs-on: ubuntu-latest
  steps:
    - name: Deploy new version
      run: |
        echo "⚠️ WARNING: This will cause downtime!"
 
        # Scale down to 0
        kubectl scale deployment/myapp --replicas=0 -n production
        kubectl rollout status deployment/myapp -n production
 
        # Update image
        kubectl set image deployment/myapp \
          myapp=myregistry.io/myapp:${{ github.sha }} \
          -n production
 
        # Scale back up
        kubectl scale deployment/myapp --replicas=3 -n production
        kubectl rollout status deployment/myapp -n production

When to Use Recreate

ScenarioWhy
✅ Development/staging environmentsDowntime doesn't matter
✅ Database schema breaking changesCan't run v1 and v2 simultaneously
✅ Licensing constraintsOnly 1 version can run at a time
❌ Production with usersCauses visible downtime
❌ Any SLA-bound serviceViolates uptime guarantees

Pros & Cons

ProsCons
Simple to implementCauses downtime
Clean state (no version mixing)No gradual rollout
Works with any appNo safety net
Easiest to understandUsers see errors during deploy

Strategy 2: Rolling Update

Gradually replace old instances with new ones, one at a time. No downtime, but both versions run simultaneously during the rollout.

How It Works

Time 0:  v1 v1 v1 v1    (4 pods running v1)
Time 1:  v1 v1 v1 v2    (1 pod updated to v2)
Time 2:  v1 v1 v2 v2    (2 pods updated)
Time 3:  v1 v2 v2 v2    (3 pods updated)
Time 4:  v2 v2 v2 v2    (rollout complete ✅)

Traffic: ═══════════════  (no interruption)

Kubernetes Implementation

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1          # Create 1 extra pod during rollout
      maxUnavailable: 0    # Never have fewer than 4 pods available
  template:
    spec:
      containers:
        - name: myapp
          image: myapp:v2
          readinessProbe:   # ← Critical: K8s only sends traffic to ready pods
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 10
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 30
            periodSeconds: 10

Pipeline Step

deploy-rolling:
  name: Rolling Update
  runs-on: ubuntu-latest
  steps:
    - name: Deploy with rolling update
      run: |
        kubectl set image deployment/myapp \
          myapp=myregistry.io/myapp:${{ github.sha }} \
          -n production
 
    - name: Monitor rollout
      run: |
        kubectl rollout status deployment/myapp \
          -n production \
          --timeout=10m
 
    - name: Verify all pods healthy
      run: |
        READY=$(kubectl get deployment myapp -n production \
          -o jsonpath='{.status.readyReplicas}')
        DESIRED=$(kubectl get deployment myapp -n production \
          -o jsonpath='{.spec.replicas}')
 
        if [ "$READY" != "$DESIRED" ]; then
          echo "❌ Not all pods are ready: $READY/$DESIRED"
          kubectl rollout undo deployment/myapp -n production
          exit 1
        fi
        echo "✅ All $READY/$DESIRED pods ready"
 
    - name: Rollback on failure
      if: failure()
      run: |
        echo "❌ Rolling back to previous version..."
        kubectl rollout undo deployment/myapp -n production
        kubectl rollout status deployment/myapp -n production

Rolling Update Parameters Explained

maxSurge: 1, maxUnavailable: 0
→ Conservative: Always maintain full capacity
→ Slower rollout, zero risk of capacity loss
→ Best for: Production services with strict SLAs

maxSurge: 2, maxUnavailable: 1
→ Balanced: Slightly faster, brief capacity dip
→ Good for: Most production workloads

maxSurge: 50%, maxUnavailable: 50%
→ Aggressive: Fast rollout, significant capacity change
→ Best for: Non-critical services, development

When to Use Rolling Updates

ScenarioWhy
✅ Default for most appsZero downtime, simple config
✅ Stateless applicationsPods are interchangeable
✅ API serversIndividual requests are short-lived
⚠️ Apps with session stateNeed sticky sessions or shared state
❌ Breaking API changesOld/new versions must be compatible

Strategy 3: Blue-Green Deployment

Run two identical environments. Blue is live (v1), Green gets the new version (v2). When Green is ready, switch all traffic instantly.

How It Works

Before:
  Blue  (v1) ████████ ← 100% traffic flows here
  Green (v1) ████████    (idle, waiting)

Step 1: Deploy v2 to Green
  Blue  (v1) ████████ ← Still serving traffic
  Green (v2) ████████    (deploying, testing...)

Step 2: Test Green environment
  Blue  (v1) ████████ ← Still serving traffic
  Green (v2) ████████ ✅ Tests pass!

Step 3: Switch traffic
  Blue  (v1) ████████    (now idle — keep for rollback)
  Green (v2) ████████ ← 100% traffic flows here now

Rollback: Just switch back to Blue!
  Blue  (v1) ████████ ← Instant rollback (seconds)
  Green (v2) ████████    (investigate issues)

Kubernetes Implementation

# blue-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-blue
  labels:
    app: myapp
    version: blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: blue
  template:
    metadata:
      labels:
        app: myapp
        version: blue
    spec:
      containers:
        - name: myapp
          image: myapp:v1.0.0
 
---
# green-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-green
  labels:
    app: myapp
    version: green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: green
  template:
    metadata:
      labels:
        app: myapp
        version: green
    spec:
      containers:
        - name: myapp
          image: myapp:v2.0.0
 
---
# service.yaml — Switch by changing selector
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp
    version: blue    # ← Change to "green" to switch
  ports:
    - port: 80
      targetPort: 3000

Pipeline Step

deploy-blue-green:
  name: Blue-Green Deployment
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
 
    - name: Determine current and target color
      id: colors
      run: |
        CURRENT=$(kubectl get service myapp -n production \
          -o jsonpath='{.spec.selector.version}')
        TARGET=$([ "$CURRENT" = "blue" ] && echo "green" || echo "blue")
        echo "current=$CURRENT" >> $GITHUB_OUTPUT
        echo "target=$TARGET" >> $GITHUB_OUTPUT
        echo "Current: $CURRENT → Target: $TARGET"
 
    - name: Deploy to target environment
      run: |
        TARGET=${{ steps.colors.outputs.target }}
 
        kubectl set image deployment/myapp-$TARGET \
          myapp=myregistry.io/myapp:${{ github.sha }} \
          -n production
 
        kubectl rollout status deployment/myapp-$TARGET \
          -n production --timeout=5m
 
    - name: Test target environment
      run: |
        TARGET=${{ steps.colors.outputs.target }}
 
        # Get internal service URL for testing
        kubectl run smoke-test --rm -i --restart=Never \
          --image=curlimages/curl:latest \
          -n production \
          -- curl -f http://myapp-$TARGET:3000/health
 
        echo "✅ Health check passed on $TARGET"
 
    - name: Switch traffic to target
      run: |
        TARGET=${{ steps.colors.outputs.target }}
 
        kubectl patch service myapp -n production \
          -p "{\"spec\":{\"selector\":{\"version\":\"$TARGET\"}}}"
 
        echo "✅ Traffic switched to $TARGET"
 
    - name: Verify switch
      run: |
        sleep 10
        curl -f https://myapp.com/health
        echo "✅ Production is healthy"
 
    - name: Rollback on failure
      if: failure()
      run: |
        CURRENT=${{ steps.colors.outputs.current }}
        echo "❌ Rolling back to $CURRENT..."
 
        kubectl patch service myapp -n production \
          -p "{\"spec\":{\"selector\":{\"version\":\"$CURRENT\"}}}"
 
        echo "✅ Rolled back to $CURRENT"

When to Use Blue-Green

ScenarioWhy
✅ Zero-downtime requiredInstant switch, no gaps
✅ Need instant rollbackJust switch the service selector
✅ Full environment testingTest everything before switching
⚠️ Database schema changesBoth environments share the DB
❌ Cost-sensitiveRequires 2x the infrastructure

Pros & Cons

ProsCons
Instant rollback (seconds)2x infrastructure cost
Zero downtimeDatabase migrations are complex
Full testing before switchNeed to maintain two environments
Simple mental modelAll-or-nothing switch

Strategy 4: Canary Deployment

Deploy the new version to a small percentage of users first. Monitor closely. If everything looks good, gradually increase to 100%.

How It Works

Step 1: Deploy canary (5% of traffic)
  v1 ████████████████████████████████████████████████ (95%)
  v2 ███                                              (5%)
  Monitor: error rate, latency, memory, CPU...

Step 2: Increase to 25%
  v1 ████████████████████████████████████             (75%)
  v2 █████████████                                    (25%)
  Monitor for 30 minutes...

Step 3: Increase to 50%
  v1 ████████████████████████                         (50%)
  v2 ████████████████████████                         (50%)
  Monitor for 30 minutes...

Step 4: Full rollout (100%)
  v2 ████████████████████████████████████████████████ (100%)
  ✅ Canary successful!

Kubernetes Canary (Basic - Replica Ratio)

# Stable deployment (90% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-stable
spec:
  replicas: 9    # 9 out of 10 pods = 90%
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
        - name: myapp
          image: myapp:v1.0.0
 
---
# Canary deployment (10% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-canary
spec:
  replicas: 1    # 1 out of 10 pods = 10%
  selector:
    matchLabels:
      app: myapp    # Same label = same service
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
        - name: myapp
          image: myapp:v2.0.0
 
---
# Service routes to both (by label)
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp    # Matches BOTH deployments
  ports:
    - port: 80
      targetPort: 3000

Canary with Istio (Fine-Grained Traffic Control)

# Istio VirtualService for precise traffic splitting
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp
spec:
  hosts:
    - myapp.production.svc.cluster.local
  http:
    - route:
        - destination:
            host: myapp
            subset: stable
          weight: 95          # 95% to stable
        - destination:
            host: myapp
            subset: canary
          weight: 5           # 5% to canary
 
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: myapp
spec:
  host: myapp
  subsets:
    - name: stable
      labels:
        version: v1
    - name: canary
      labels:
        version: v2

Automated Canary Pipeline

deploy-canary:
  name: Canary Deployment
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
 
    # Step 1: Deploy canary (5%)
    - name: Deploy canary (5%)
      run: |
        kubectl set image deployment/myapp-canary \
          myapp=myregistry.io/myapp:${{ github.sha }} \
          -n production
        kubectl rollout status deployment/myapp-canary -n production
 
    # Step 2: Monitor canary for 5 minutes
    - name: Monitor canary (5 min)
      run: |
        echo "Monitoring canary for 5 minutes..."
        for i in $(seq 1 10); do
          sleep 30
 
          # Check error rate
          ERROR_RATE=$(curl -s "http://prometheus:9090/api/v1/query" \
            --data-urlencode 'query=rate(http_requests_total{status=~"5..",version="canary"}[1m]) / rate(http_requests_total{version="canary"}[1m]) * 100' \
            | jq -r '.data.result[0].value[1] // "0"')
 
          # Check latency
          LATENCY=$(curl -s "http://prometheus:9090/api/v1/query" \
            --data-urlencode 'query=histogram_quantile(0.99, rate(http_request_duration_seconds_bucket{version="canary"}[1m]))' \
            | jq -r '.data.result[0].value[1] // "0"')
 
          echo "Check $i/10: Error rate: ${ERROR_RATE}%, P99 latency: ${LATENCY}s"
 
          # Abort if error rate > 5% or latency > 2s
          if (( $(echo "$ERROR_RATE > 5" | bc -l) )); then
            echo "❌ Error rate too high! Aborting canary."
            kubectl rollout undo deployment/myapp-canary -n production
            exit 1
          fi
 
          if (( $(echo "$LATENCY > 2.0" | bc -l) )); then
            echo "❌ Latency too high! Aborting canary."
            kubectl rollout undo deployment/myapp-canary -n production
            exit 1
          fi
        done
        echo "✅ Canary monitoring passed"
 
    # Step 3: Promote canary to stable
    - name: Promote to 100%
      run: |
        kubectl set image deployment/myapp-stable \
          myapp=myregistry.io/myapp:${{ github.sha }} \
          -n production
        kubectl rollout status deployment/myapp-stable -n production
 
        # Scale down canary
        kubectl scale deployment/myapp-canary --replicas=0 -n production
 
        echo "✅ Canary promoted to stable!"
 
    - name: Rollback on failure
      if: failure()
      run: |
        kubectl rollout undo deployment/myapp-canary -n production
        echo "✅ Canary rolled back"

When to Use Canary

ScenarioWhy
✅ High-traffic applicationsMinimize blast radius
✅ Risk-averse deploymentsTest with real users safely
✅ Performance-sensitive changesMonitor real metrics
⚠️ Small user baseNeed enough traffic for meaningful data
❌ Database schema changesHard to route queries by version

Strategy 5: A/B Testing Deployment

Route specific users to specific versions based on criteria like cookies, headers, geographic location, or user segments.

How It Works

User hits load balancer
        ↓
┌───────────────────────┐
│  Routing Decision     │
│                       │
│  Header: x-beta=true  │──→ v2 (beta users)
│  Cookie: ab_group=B   │──→ v2 (test group)
│  IP: US-WEST          │──→ v2 (regional rollout)
│  Default              │──→ v1 (everyone else)
└───────────────────────┘

Istio A/B Routing

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp
spec:
  hosts:
    - myapp.com
  http:
    # Route beta users to v2
    - match:
        - headers:
            x-beta-user:
              exact: "true"
      route:
        - destination:
            host: myapp
            subset: v2
 
    # Route users with test cookie to v2
    - match:
        - headers:
            cookie:
              regex: ".*ab_group=B.*"
      route:
        - destination:
            host: myapp
            subset: v2
 
    # Everyone else gets v1
    - route:
        - destination:
            host: myapp
            subset: v1

Strategy 6: Shadow Deployment (Dark Launch)

Deploy the new version alongside production and send a copy of real traffic to it — but don't return the new version's responses to users. Only compare results.

How It Works

User Request
     ↓
Load Balancer
     ↓
┌────────────────┐
│   v1 (Live)    │──→ Response to user ✅
│                │
│   v2 (Shadow)  │──→ Response discarded (only logged)
│                │    Compare: Did v2 give same result?
└────────────────┘

When to Use Shadow Deployments

✅ Testing performance of a rewritten service ✅ Validating a database migration with real queries ✅ Testing a new ML model with real data ❌ Write operations (would double side effects!)

Istio Shadow/Mirror Traffic

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp
spec:
  hosts:
    - myapp
  http:
    - route:
        - destination:
            host: myapp
            subset: v1
      mirror:
        host: myapp
        subset: v2
      mirrorPercentage:
        value: 100.0  # Mirror 100% of traffic

Feature Flags: Decouple Deploy from Release

Feature flags let you deploy code to production but control whether users see it. This is the most flexible approach.

How Feature Flags Work

// Code deployed to production — but feature hidden
if (featureFlags.isEnabled('new-checkout')) {
  renderNewCheckout();  // Only for enabled users
} else {
  renderOldCheckout();  // Everyone else
}

Rollout Timeline

Day 1:  Deploy code with flag OFF (0% of users see it)
Day 1:  Internal testing — enable for employees only
Day 2:  Enable for 1% of users → monitor
Day 3:  Enable for 10% of users → monitor
Day 5:  Enable for 50% of users → monitor
Day 7:  Enable for 100% → monitor
Day 14: Remove flag, clean up old code

Feature Flag Tools

ToolTypeFree Tier
LaunchDarklySaaSLimited
Split.ioSaaSFree for dev
FlagsmithOpen-sourceSelf-hosted free
UnleashOpen-sourceSelf-hosted free
PostHogOpen-sourceGenerous free
ConfigCatSaaSFree tier

Combining Feature Flags with Canary

Step 1: Canary deploy (code deploys to 5% of pods)
Step 2: Feature flag OFF (new feature invisible to all users)
Step 3: Flag ON for internal users (QA testing in production)
Step 4: Flag ON for 5% of external users (A/B test)
Step 5: Flag ON for 100% (full rollout)

Rollback options:
  - Turn flag OFF (instant, no deploy needed)
  - Rollback canary (revert code)

Comparison Matrix

StrategyDowntimeRollback SpeedRiskCostComplexity
RecreateYes ❌Slow (redeploy)HighLowVery Low
RollingNo ✅Fast (undo)MediumLowLow
Blue-GreenNo ✅Instant ⚡LowHigh (2x)Medium
CanaryNo ✅FastVery LowMediumHigh
A/B TestingNo ✅Instant ⚡Very LowMediumHigh
ShadowNo ✅N/ANoneHigh (2x)Very High
Feature FlagsNo ✅Instant ⚡Very LowLowMedium

Decision Tree: Which Strategy Should You Use?

Start here
   │
   ├─ Can you afford downtime?
   │  └─ Yes → Recreate (simplest)
   │
   ├─ Need zero downtime, low complexity?
   │  └─ Rolling Update (K8s default)
   │
   ├─ Need instant rollback?
   │  └─ Blue-Green (2x infra cost)
   │
   ├─ High-traffic, risk-averse?
   │  └─ Canary (gradual rollout)
   │
   ├─ Need user-specific targeting?
   │  └─ A/B Testing (header/cookie routing)
   │
   ├─ Testing a complete rewrite?
   │  └─ Shadow (compare real traffic)
   │
   └─ Want maximum flexibility?
      └─ Feature Flags + Rolling/Canary

Recommended by Company Stage

StageRecommended StrategyWhy
Startup (< 50 users)Rolling UpdateSimple, good enough
Growing (< 10K users)Rolling + Feature FlagsBalance speed and safety
Scaling (< 1M users)Canary + Feature FlagsGradual rollout, metrics-driven
Enterprise (1M+ users)Blue-Green + Canary + Feature FlagsMaximum control