Deployment Strategies - CI/CD Pipelines

Why Deployment Strategy Matters

How you deploy is as important as what you deploy. The wrong strategy can take your entire application offline. The right one lets you ship with zero downtime and instant rollback.

Bad Deployment:
  Friday 5 PM → Push new version → CRASH → All users affected
  → 3 hours to rollback → Lost revenue → Angry customers

Good Deployment:
  Tuesday 10 AM → Canary to 5% → Monitor 30 min → All green
  → Roll to 25% → Monitor → Roll to 100% → Zero downtime ✅

Strategy 1: Recreate (Big Bang)

Stop everything, deploy new version, start everything. The simplest but most dangerous approach.

How It Works

Time 0:  v1 ████████████ (serving 100% traffic)
         ↓ STOP ALL
Time 1:  ______________ (DOWNTIME - no traffic served)
         ↓ START NEW
Time 2:  v2 ████████████ (serving 100% traffic)

Kubernetes Implementation

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 3
  strategy:
    type: Recreate    # ← Kill all old pods, then create new ones
  template:
    spec:
      containers:
        - name: myapp
          image: myapp:v2

Pipeline Step

deploy-recreate:
  name: Recreate Deployment
  runs-on: ubuntu-latest
  steps:
    - name: Deploy new version
      run: |
        echo "⚠️ WARNING: This will cause downtime!"
 
        # Scale down to 0
        kubectl scale deployment/myapp --replicas=0 -n production
        kubectl rollout status deployment/myapp -n production
 
        # Update image
        kubectl set image deployment/myapp \
          myapp=myregistry.io/myapp:${{ github.sha }} \
          -n production
 
        # Scale back up
        kubectl scale deployment/myapp --replicas=3 -n production
        kubectl rollout status deployment/myapp -n production

When to Use Recreate

Scenario	Why
✅ Development/staging environments	Downtime doesn't matter
✅ Database schema breaking changes	Can't run v1 and v2 simultaneously
✅ Licensing constraints	Only 1 version can run at a time
❌ Production with users	Causes visible downtime
❌ Any SLA-bound service	Violates uptime guarantees

Pros & Cons

Pros	Cons
Simple to implement	Causes downtime
Clean state (no version mixing)	No gradual rollout
Works with any app	No safety net
Easiest to understand	Users see errors during deploy

Strategy 2: Rolling Update

Gradually replace old instances with new ones, one at a time. No downtime, but both versions run simultaneously during the rollout.

How It Works

Time 0:  v1 v1 v1 v1    (4 pods running v1)
Time 1:  v1 v1 v1 v2    (1 pod updated to v2)
Time 2:  v1 v1 v2 v2    (2 pods updated)
Time 3:  v1 v2 v2 v2    (3 pods updated)
Time 4:  v2 v2 v2 v2    (rollout complete ✅)

Traffic: ═══════════════  (no interruption)

Kubernetes Implementation

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1          # Create 1 extra pod during rollout
      maxUnavailable: 0    # Never have fewer than 4 pods available
  template:
    spec:
      containers:
        - name: myapp
          image: myapp:v2
          readinessProbe:   # ← Critical: K8s only sends traffic to ready pods
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 10
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 30
            periodSeconds: 10

Pipeline Step

deploy-rolling:
  name: Rolling Update
  runs-on: ubuntu-latest
  steps:
    - name: Deploy with rolling update
      run: |
        kubectl set image deployment/myapp \
          myapp=myregistry.io/myapp:${{ github.sha }} \
          -n production
 
    - name: Monitor rollout
      run: |
        kubectl rollout status deployment/myapp \
          -n production \
          --timeout=10m
 
    - name: Verify all pods healthy
      run: |
        READY=$(kubectl get deployment myapp -n production \
          -o jsonpath='{.status.readyReplicas}')
        DESIRED=$(kubectl get deployment myapp -n production \
          -o jsonpath='{.spec.replicas}')
 
        if [ "$READY" != "$DESIRED" ]; then
          echo "❌ Not all pods are ready: $READY/$DESIRED"
          kubectl rollout undo deployment/myapp -n production
          exit 1
        fi
        echo "✅ All $READY/$DESIRED pods ready"
 
    - name: Rollback on failure
      if: failure()
      run: |
        echo "❌ Rolling back to previous version..."
        kubectl rollout undo deployment/myapp -n production
        kubectl rollout status deployment/myapp -n production

Rolling Update Parameters Explained

maxSurge: 1, maxUnavailable: 0
→ Conservative: Always maintain full capacity
→ Slower rollout, zero risk of capacity loss
→ Best for: Production services with strict SLAs

maxSurge: 2, maxUnavailable: 1
→ Balanced: Slightly faster, brief capacity dip
→ Good for: Most production workloads

maxSurge: 50%, maxUnavailable: 50%
→ Aggressive: Fast rollout, significant capacity change
→ Best for: Non-critical services, development

When to Use Rolling Updates

Scenario	Why
✅ Default for most apps	Zero downtime, simple config
✅ Stateless applications	Pods are interchangeable
✅ API servers	Individual requests are short-lived
⚠️ Apps with session state	Need sticky sessions or shared state
❌ Breaking API changes	Old/new versions must be compatible

Strategy 3: Blue-Green Deployment

Run two identical environments. Blue is live (v1), Green gets the new version (v2). When Green is ready, switch all traffic instantly.

How It Works

Before:
  Blue  (v1) ████████ ← 100% traffic flows here
  Green (v1) ████████    (idle, waiting)

Step 1: Deploy v2 to Green
  Blue  (v1) ████████ ← Still serving traffic
  Green (v2) ████████    (deploying, testing...)

Step 2: Test Green environment
  Blue  (v1) ████████ ← Still serving traffic
  Green (v2) ████████ ✅ Tests pass!

Step 3: Switch traffic
  Blue  (v1) ████████    (now idle — keep for rollback)
  Green (v2) ████████ ← 100% traffic flows here now

Rollback: Just switch back to Blue!
  Blue  (v1) ████████ ← Instant rollback (seconds)
  Green (v2) ████████    (investigate issues)

Kubernetes Implementation

# blue-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-blue
  labels:
    app: myapp
    version: blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: blue
  template:
    metadata:
      labels:
        app: myapp
        version: blue
    spec:
      containers:
        - name: myapp
          image: myapp:v1.0.0
 
---
# green-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-green
  labels:
    app: myapp
    version: green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: green
  template:
    metadata:
      labels:
        app: myapp
        version: green
    spec:
      containers:
        - name: myapp
          image: myapp:v2.0.0
 
---
# service.yaml — Switch by changing selector
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp
    version: blue    # ← Change to "green" to switch
  ports:
    - port: 80
      targetPort: 3000

Pipeline Step

deploy-blue-green:
  name: Blue-Green Deployment
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
 
    - name: Determine current and target color
      id: colors
      run: |
        CURRENT=$(kubectl get service myapp -n production \
          -o jsonpath='{.spec.selector.version}')
        TARGET=$([ "$CURRENT" = "blue" ] && echo "green" || echo "blue")
        echo "current=$CURRENT" >> $GITHUB_OUTPUT
        echo "target=$TARGET" >> $GITHUB_OUTPUT
        echo "Current: $CURRENT → Target: $TARGET"
 
    - name: Deploy to target environment
      run: |
        TARGET=${{ steps.colors.outputs.target }}
 
        kubectl set image deployment/myapp-$TARGET \
          myapp=myregistry.io/myapp:${{ github.sha }} \
          -n production
 
        kubectl rollout status deployment/myapp-$TARGET \
          -n production --timeout=5m
 
    - name: Test target environment
      run: |
        TARGET=${{ steps.colors.outputs.target }}
 
        # Get internal service URL for testing
        kubectl run smoke-test --rm -i --restart=Never \
          --image=curlimages/curl:latest \
          -n production \
          -- curl -f http://myapp-$TARGET:3000/health
 
        echo "✅ Health check passed on $TARGET"
 
    - name: Switch traffic to target
      run: |
        TARGET=${{ steps.colors.outputs.target }}
 
        kubectl patch service myapp -n production \
          -p "{\"spec\":{\"selector\":{\"version\":\"$TARGET\"}}}"
 
        echo "✅ Traffic switched to $TARGET"
 
    - name: Verify switch
      run: |
        sleep 10
        curl -f https://myapp.com/health
        echo "✅ Production is healthy"
 
    - name: Rollback on failure
      if: failure()
      run: |
        CURRENT=${{ steps.colors.outputs.current }}
        echo "❌ Rolling back to $CURRENT..."
 
        kubectl patch service myapp -n production \
          -p "{\"spec\":{\"selector\":{\"version\":\"$CURRENT\"}}}"
 
        echo "✅ Rolled back to $CURRENT"

When to Use Blue-Green

Scenario	Why
✅ Zero-downtime required	Instant switch, no gaps
✅ Need instant rollback	Just switch the service selector
✅ Full environment testing	Test everything before switching
⚠️ Database schema changes	Both environments share the DB
❌ Cost-sensitive	Requires 2x the infrastructure

Pros & Cons

Pros	Cons
Instant rollback (seconds)	2x infrastructure cost
Zero downtime	Database migrations are complex
Full testing before switch	Need to maintain two environments
Simple mental model	All-or-nothing switch

Strategy 4: Canary Deployment

Deploy the new version to a small percentage of users first. Monitor closely. If everything looks good, gradually increase to 100%.

How It Works

Step 1: Deploy canary (5% of traffic)
  v1 ████████████████████████████████████████████████ (95%)
  v2 ███                                              (5%)
  Monitor: error rate, latency, memory, CPU...

Step 2: Increase to 25%
  v1 ████████████████████████████████████             (75%)
  v2 █████████████                                    (25%)
  Monitor for 30 minutes...

Step 3: Increase to 50%
  v1 ████████████████████████                         (50%)
  v2 ████████████████████████                         (50%)
  Monitor for 30 minutes...

Step 4: Full rollout (100%)
  v2 ████████████████████████████████████████████████ (100%)
  ✅ Canary successful!

Kubernetes Canary (Basic - Replica Ratio)

# Stable deployment (90% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-stable
spec:
  replicas: 9    # 9 out of 10 pods = 90%
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
        - name: myapp
          image: myapp:v1.0.0
 
---
# Canary deployment (10% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-canary
spec:
  replicas: 1    # 1 out of 10 pods = 10%
  selector:
    matchLabels:
      app: myapp    # Same label = same service
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
        - name: myapp
          image: myapp:v2.0.0
 
---
# Service routes to both (by label)
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp    # Matches BOTH deployments
  ports:
    - port: 80
      targetPort: 3000

Canary with Istio (Fine-Grained Traffic Control)

# Istio VirtualService for precise traffic splitting
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp
spec:
  hosts:
    - myapp.production.svc.cluster.local
  http:
    - route:
        - destination:
            host: myapp
            subset: stable
          weight: 95          # 95% to stable
        - destination:
            host: myapp
            subset: canary
          weight: 5           # 5% to canary
 
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: myapp
spec:
  host: myapp
  subsets:
    - name: stable
      labels:
        version: v1
    - name: canary
      labels:
        version: v2

Automated Canary Pipeline

deploy-canary:
  name: Canary Deployment
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
 
    # Step 1: Deploy canary (5%)
    - name: Deploy canary (5%)
      run: |
        kubectl set image deployment/myapp-canary \
          myapp=myregistry.io/myapp:${{ github.sha }} \
          -n production
        kubectl rollout status deployment/myapp-canary -n production
 
    # Step 2: Monitor canary for 5 minutes
    - name: Monitor canary (5 min)
      run: |
        echo "Monitoring canary for 5 minutes..."
        for i in $(seq 1 10); do
          sleep 30
 
          # Check error rate
          ERROR_RATE=$(curl -s "http://prometheus:9090/api/v1/query" \
            --data-urlencode 'query=rate(http_requests_total{status=~"5..",version="canary"}[1m]) / rate(http_requests_total{version="canary"}[1m]) * 100' \
            | jq -r '.data.result[0].value[1] // "0"')
 
          # Check latency
          LATENCY=$(curl -s "http://prometheus:9090/api/v1/query" \
            --data-urlencode 'query=histogram_quantile(0.99, rate(http_request_duration_seconds_bucket{version="canary"}[1m]))' \
            | jq -r '.data.result[0].value[1] // "0"')
 
          echo "Check $i/10: Error rate: ${ERROR_RATE}%, P99 latency: ${LATENCY}s"
 
          # Abort if error rate > 5% or latency > 2s
          if (( $(echo "$ERROR_RATE > 5" | bc -l) )); then
            echo "❌ Error rate too high! Aborting canary."
            kubectl rollout undo deployment/myapp-canary -n production
            exit 1
          fi
 
          if (( $(echo "$LATENCY > 2.0" | bc -l) )); then
            echo "❌ Latency too high! Aborting canary."
            kubectl rollout undo deployment/myapp-canary -n production
            exit 1
          fi
        done
        echo "✅ Canary monitoring passed"
 
    # Step 3: Promote canary to stable
    - name: Promote to 100%
      run: |
        kubectl set image deployment/myapp-stable \
          myapp=myregistry.io/myapp:${{ github.sha }} \
          -n production
        kubectl rollout status deployment/myapp-stable -n production
 
        # Scale down canary
        kubectl scale deployment/myapp-canary --replicas=0 -n production
 
        echo "✅ Canary promoted to stable!"
 
    - name: Rollback on failure
      if: failure()
      run: |
        kubectl rollout undo deployment/myapp-canary -n production
        echo "✅ Canary rolled back"

When to Use Canary

Scenario	Why
✅ High-traffic applications	Minimize blast radius
✅ Risk-averse deployments	Test with real users safely
✅ Performance-sensitive changes	Monitor real metrics
⚠️ Small user base	Need enough traffic for meaningful data
❌ Database schema changes	Hard to route queries by version

Strategy 5: A/B Testing Deployment

Route specific users to specific versions based on criteria like cookies, headers, geographic location, or user segments.

How It Works

Istio A/B Routing

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp
spec:
  hosts:
    - myapp.com
  http:
    # Route beta users to v2
    - match:
        - headers:
            x-beta-user:
              exact: "true"
      route:
        - destination:
            host: myapp
            subset: v2
 
    # Route users with test cookie to v2
    - match:
        - headers:
            cookie:
              regex: ".*ab_group=B.*"
      route:
        - destination:
            host: myapp
            subset: v2
 
    # Everyone else gets v1
    - route:
        - destination:
            host: myapp
            subset: v1

Strategy 6: Shadow Deployment (Dark Launch)

Deploy the new version alongside production and send a copy of real traffic to it — but don't return the new version's responses to users. Only compare results.

How It Works

When to Use Shadow Deployments

✅ Testing performance of a rewritten service ✅ Validating a database migration with real queries ✅ Testing a new ML model with real data ❌ Write operations (would double side effects!)

Istio Shadow/Mirror Traffic

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp
spec:
  hosts:
    - myapp
  http:
    - route:
        - destination:
            host: myapp
            subset: v1
      mirror:
        host: myapp
        subset: v2
      mirrorPercentage:
        value: 100.0  # Mirror 100% of traffic

Feature Flags: Decouple Deploy from Release

Feature flags let you deploy code to production but control whether users see it. This is the most flexible approach.

How Feature Flags Work

// Code deployed to production — but feature hidden
if (featureFlags.isEnabled('new-checkout')) {
  renderNewCheckout();  // Only for enabled users
} else {
  renderOldCheckout();  // Everyone else
}

Rollout Timeline

Day 1:  Deploy code with flag OFF (0% of users see it)
Day 1:  Internal testing — enable for employees only
Day 2:  Enable for 1% of users → monitor
Day 3:  Enable for 10% of users → monitor
Day 5:  Enable for 50% of users → monitor
Day 7:  Enable for 100% → monitor
Day 14: Remove flag, clean up old code

Feature Flag Tools

Tool	Type	Free Tier
LaunchDarkly	SaaS	Limited
Split.io	SaaS	Free for dev
Flagsmith	Open-source	Self-hosted free
Unleash	Open-source	Self-hosted free
PostHog	Open-source	Generous free
ConfigCat	SaaS	Free tier

Combining Feature Flags with Canary

Step 1: Canary deploy (code deploys to 5% of pods)
Step 2: Feature flag OFF (new feature invisible to all users)
Step 3: Flag ON for internal users (QA testing in production)
Step 4: Flag ON for 5% of external users (A/B test)
Step 5: Flag ON for 100% (full rollout)

Rollback options:
  - Turn flag OFF (instant, no deploy needed)
  - Rollback canary (revert code)

Comparison Matrix

Strategy	Downtime	Rollback Speed	Risk	Cost	Complexity
Recreate	Yes ❌	Slow (redeploy)	High	Low	Very Low
Rolling	No ✅	Fast (undo)	Medium	Low	Low
Blue-Green	No ✅	Instant ⚡	Low	High (2x)	Medium
Canary	No ✅	Fast	Very Low	Medium	High
A/B Testing	No ✅	Instant ⚡	Very Low	Medium	High
Shadow	No ✅	N/A	None	High (2x)	Very High
Feature Flags	No ✅	Instant ⚡	Very Low	Low	Medium

Decision Tree: Which Strategy Should You Use?

Recommended by Company Stage

Stage	Recommended Strategy	Why
Startup (< 50 users)	Rolling Update	Simple, good enough
Growing (< 10K users)	Rolling + Feature Flags	Balance speed and safety
Scaling (< 1M users)	Canary + Feature Flags	Gradual rollout, metrics-driven
Enterprise (1M+ users)	Blue-Green + Canary + Feature Flags	Maximum control