Why Deployment Strategy Matters
How you deploy is as important as what you deploy. The wrong strategy can take your entire application offline. The right one lets you ship with zero downtime and instant rollback.
Bad Deployment:
Friday 5 PM → Push new version → CRASH → All users affected
→ 3 hours to rollback → Lost revenue → Angry customers
Good Deployment:
Tuesday 10 AM → Canary to 5% → Monitor 30 min → All green
→ Roll to 25% → Monitor → Roll to 100% → Zero downtime ✅
Strategy 1: Recreate (Big Bang)
Stop everything, deploy new version, start everything. The simplest but most dangerous approach.
How It Works
Time 0: v1 ████████████ (serving 100% traffic)
↓ STOP ALL
Time 1: ______________ (DOWNTIME - no traffic served)
↓ START NEW
Time 2: v2 ████████████ (serving 100% traffic)
Kubernetes Implementation
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 3
strategy:
type: Recreate # ← Kill all old pods, then create new ones
template:
spec:
containers:
- name: myapp
image: myapp:v2Pipeline Step
deploy-recreate:
name: Recreate Deployment
runs-on: ubuntu-latest
steps:
- name: Deploy new version
run: |
echo "⚠️ WARNING: This will cause downtime!"
# Scale down to 0
kubectl scale deployment/myapp --replicas=0 -n production
kubectl rollout status deployment/myapp -n production
# Update image
kubectl set image deployment/myapp \
myapp=myregistry.io/myapp:${{ github.sha }} \
-n production
# Scale back up
kubectl scale deployment/myapp --replicas=3 -n production
kubectl rollout status deployment/myapp -n productionWhen to Use Recreate
| Scenario | Why |
|---|---|
| ✅ Development/staging environments | Downtime doesn't matter |
| ✅ Database schema breaking changes | Can't run v1 and v2 simultaneously |
| ✅ Licensing constraints | Only 1 version can run at a time |
| ❌ Production with users | Causes visible downtime |
| ❌ Any SLA-bound service | Violates uptime guarantees |
Pros & Cons
| Pros | Cons |
|---|---|
| Simple to implement | Causes downtime |
| Clean state (no version mixing) | No gradual rollout |
| Works with any app | No safety net |
| Easiest to understand | Users see errors during deploy |
Strategy 2: Rolling Update
Gradually replace old instances with new ones, one at a time. No downtime, but both versions run simultaneously during the rollout.
How It Works
Time 0: v1 v1 v1 v1 (4 pods running v1)
Time 1: v1 v1 v1 v2 (1 pod updated to v2)
Time 2: v1 v1 v2 v2 (2 pods updated)
Time 3: v1 v2 v2 v2 (3 pods updated)
Time 4: v2 v2 v2 v2 (rollout complete ✅)
Traffic: ═══════════════ (no interruption)
Kubernetes Implementation
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Create 1 extra pod during rollout
maxUnavailable: 0 # Never have fewer than 4 pods available
template:
spec:
containers:
- name: myapp
image: myapp:v2
readinessProbe: # ← Critical: K8s only sends traffic to ready pods
httpGet:
path: /health
port: 3000
initialDelaySeconds: 10
periodSeconds: 5
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10Pipeline Step
deploy-rolling:
name: Rolling Update
runs-on: ubuntu-latest
steps:
- name: Deploy with rolling update
run: |
kubectl set image deployment/myapp \
myapp=myregistry.io/myapp:${{ github.sha }} \
-n production
- name: Monitor rollout
run: |
kubectl rollout status deployment/myapp \
-n production \
--timeout=10m
- name: Verify all pods healthy
run: |
READY=$(kubectl get deployment myapp -n production \
-o jsonpath='{.status.readyReplicas}')
DESIRED=$(kubectl get deployment myapp -n production \
-o jsonpath='{.spec.replicas}')
if [ "$READY" != "$DESIRED" ]; then
echo "❌ Not all pods are ready: $READY/$DESIRED"
kubectl rollout undo deployment/myapp -n production
exit 1
fi
echo "✅ All $READY/$DESIRED pods ready"
- name: Rollback on failure
if: failure()
run: |
echo "❌ Rolling back to previous version..."
kubectl rollout undo deployment/myapp -n production
kubectl rollout status deployment/myapp -n productionRolling Update Parameters Explained
maxSurge: 1, maxUnavailable: 0
→ Conservative: Always maintain full capacity
→ Slower rollout, zero risk of capacity loss
→ Best for: Production services with strict SLAs
maxSurge: 2, maxUnavailable: 1
→ Balanced: Slightly faster, brief capacity dip
→ Good for: Most production workloads
maxSurge: 50%, maxUnavailable: 50%
→ Aggressive: Fast rollout, significant capacity change
→ Best for: Non-critical services, development
When to Use Rolling Updates
| Scenario | Why |
|---|---|
| ✅ Default for most apps | Zero downtime, simple config |
| ✅ Stateless applications | Pods are interchangeable |
| ✅ API servers | Individual requests are short-lived |
| ⚠️ Apps with session state | Need sticky sessions or shared state |
| ❌ Breaking API changes | Old/new versions must be compatible |
Strategy 3: Blue-Green Deployment
Run two identical environments. Blue is live (v1), Green gets the new version (v2). When Green is ready, switch all traffic instantly.
How It Works
Before:
Blue (v1) ████████ ← 100% traffic flows here
Green (v1) ████████ (idle, waiting)
Step 1: Deploy v2 to Green
Blue (v1) ████████ ← Still serving traffic
Green (v2) ████████ (deploying, testing...)
Step 2: Test Green environment
Blue (v1) ████████ ← Still serving traffic
Green (v2) ████████ ✅ Tests pass!
Step 3: Switch traffic
Blue (v1) ████████ (now idle — keep for rollback)
Green (v2) ████████ ← 100% traffic flows here now
Rollback: Just switch back to Blue!
Blue (v1) ████████ ← Instant rollback (seconds)
Green (v2) ████████ (investigate issues)
Kubernetes Implementation
# blue-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-blue
labels:
app: myapp
version: blue
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: blue
template:
metadata:
labels:
app: myapp
version: blue
spec:
containers:
- name: myapp
image: myapp:v1.0.0
---
# green-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-green
labels:
app: myapp
version: green
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: green
template:
metadata:
labels:
app: myapp
version: green
spec:
containers:
- name: myapp
image: myapp:v2.0.0
---
# service.yaml — Switch by changing selector
apiVersion: v1
kind: Service
metadata:
name: myapp
spec:
selector:
app: myapp
version: blue # ← Change to "green" to switch
ports:
- port: 80
targetPort: 3000Pipeline Step
deploy-blue-green:
name: Blue-Green Deployment
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Determine current and target color
id: colors
run: |
CURRENT=$(kubectl get service myapp -n production \
-o jsonpath='{.spec.selector.version}')
TARGET=$([ "$CURRENT" = "blue" ] && echo "green" || echo "blue")
echo "current=$CURRENT" >> $GITHUB_OUTPUT
echo "target=$TARGET" >> $GITHUB_OUTPUT
echo "Current: $CURRENT → Target: $TARGET"
- name: Deploy to target environment
run: |
TARGET=${{ steps.colors.outputs.target }}
kubectl set image deployment/myapp-$TARGET \
myapp=myregistry.io/myapp:${{ github.sha }} \
-n production
kubectl rollout status deployment/myapp-$TARGET \
-n production --timeout=5m
- name: Test target environment
run: |
TARGET=${{ steps.colors.outputs.target }}
# Get internal service URL for testing
kubectl run smoke-test --rm -i --restart=Never \
--image=curlimages/curl:latest \
-n production \
-- curl -f http://myapp-$TARGET:3000/health
echo "✅ Health check passed on $TARGET"
- name: Switch traffic to target
run: |
TARGET=${{ steps.colors.outputs.target }}
kubectl patch service myapp -n production \
-p "{\"spec\":{\"selector\":{\"version\":\"$TARGET\"}}}"
echo "✅ Traffic switched to $TARGET"
- name: Verify switch
run: |
sleep 10
curl -f https://myapp.com/health
echo "✅ Production is healthy"
- name: Rollback on failure
if: failure()
run: |
CURRENT=${{ steps.colors.outputs.current }}
echo "❌ Rolling back to $CURRENT..."
kubectl patch service myapp -n production \
-p "{\"spec\":{\"selector\":{\"version\":\"$CURRENT\"}}}"
echo "✅ Rolled back to $CURRENT"When to Use Blue-Green
| Scenario | Why |
|---|---|
| ✅ Zero-downtime required | Instant switch, no gaps |
| ✅ Need instant rollback | Just switch the service selector |
| ✅ Full environment testing | Test everything before switching |
| ⚠️ Database schema changes | Both environments share the DB |
| ❌ Cost-sensitive | Requires 2x the infrastructure |
Pros & Cons
| Pros | Cons |
|---|---|
| Instant rollback (seconds) | 2x infrastructure cost |
| Zero downtime | Database migrations are complex |
| Full testing before switch | Need to maintain two environments |
| Simple mental model | All-or-nothing switch |
Strategy 4: Canary Deployment
Deploy the new version to a small percentage of users first. Monitor closely. If everything looks good, gradually increase to 100%.
How It Works
Step 1: Deploy canary (5% of traffic)
v1 ████████████████████████████████████████████████ (95%)
v2 ███ (5%)
Monitor: error rate, latency, memory, CPU...
Step 2: Increase to 25%
v1 ████████████████████████████████████ (75%)
v2 █████████████ (25%)
Monitor for 30 minutes...
Step 3: Increase to 50%
v1 ████████████████████████ (50%)
v2 ████████████████████████ (50%)
Monitor for 30 minutes...
Step 4: Full rollout (100%)
v2 ████████████████████████████████████████████████ (100%)
✅ Canary successful!
Kubernetes Canary (Basic - Replica Ratio)
# Stable deployment (90% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-stable
spec:
replicas: 9 # 9 out of 10 pods = 90%
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myapp:v1.0.0
---
# Canary deployment (10% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-canary
spec:
replicas: 1 # 1 out of 10 pods = 10%
selector:
matchLabels:
app: myapp # Same label = same service
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myapp:v2.0.0
---
# Service routes to both (by label)
apiVersion: v1
kind: Service
metadata:
name: myapp
spec:
selector:
app: myapp # Matches BOTH deployments
ports:
- port: 80
targetPort: 3000Canary with Istio (Fine-Grained Traffic Control)
# Istio VirtualService for precise traffic splitting
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- myapp.production.svc.cluster.local
http:
- route:
- destination:
host: myapp
subset: stable
weight: 95 # 95% to stable
- destination:
host: myapp
subset: canary
weight: 5 # 5% to canary
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: myapp
spec:
host: myapp
subsets:
- name: stable
labels:
version: v1
- name: canary
labels:
version: v2Automated Canary Pipeline
deploy-canary:
name: Canary Deployment
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
# Step 1: Deploy canary (5%)
- name: Deploy canary (5%)
run: |
kubectl set image deployment/myapp-canary \
myapp=myregistry.io/myapp:${{ github.sha }} \
-n production
kubectl rollout status deployment/myapp-canary -n production
# Step 2: Monitor canary for 5 minutes
- name: Monitor canary (5 min)
run: |
echo "Monitoring canary for 5 minutes..."
for i in $(seq 1 10); do
sleep 30
# Check error rate
ERROR_RATE=$(curl -s "http://prometheus:9090/api/v1/query" \
--data-urlencode 'query=rate(http_requests_total{status=~"5..",version="canary"}[1m]) / rate(http_requests_total{version="canary"}[1m]) * 100' \
| jq -r '.data.result[0].value[1] // "0"')
# Check latency
LATENCY=$(curl -s "http://prometheus:9090/api/v1/query" \
--data-urlencode 'query=histogram_quantile(0.99, rate(http_request_duration_seconds_bucket{version="canary"}[1m]))' \
| jq -r '.data.result[0].value[1] // "0"')
echo "Check $i/10: Error rate: ${ERROR_RATE}%, P99 latency: ${LATENCY}s"
# Abort if error rate > 5% or latency > 2s
if (( $(echo "$ERROR_RATE > 5" | bc -l) )); then
echo "❌ Error rate too high! Aborting canary."
kubectl rollout undo deployment/myapp-canary -n production
exit 1
fi
if (( $(echo "$LATENCY > 2.0" | bc -l) )); then
echo "❌ Latency too high! Aborting canary."
kubectl rollout undo deployment/myapp-canary -n production
exit 1
fi
done
echo "✅ Canary monitoring passed"
# Step 3: Promote canary to stable
- name: Promote to 100%
run: |
kubectl set image deployment/myapp-stable \
myapp=myregistry.io/myapp:${{ github.sha }} \
-n production
kubectl rollout status deployment/myapp-stable -n production
# Scale down canary
kubectl scale deployment/myapp-canary --replicas=0 -n production
echo "✅ Canary promoted to stable!"
- name: Rollback on failure
if: failure()
run: |
kubectl rollout undo deployment/myapp-canary -n production
echo "✅ Canary rolled back"When to Use Canary
| Scenario | Why |
|---|---|
| ✅ High-traffic applications | Minimize blast radius |
| ✅ Risk-averse deployments | Test with real users safely |
| ✅ Performance-sensitive changes | Monitor real metrics |
| ⚠️ Small user base | Need enough traffic for meaningful data |
| ❌ Database schema changes | Hard to route queries by version |
Strategy 5: A/B Testing Deployment
Route specific users to specific versions based on criteria like cookies, headers, geographic location, or user segments.
How It Works
User hits load balancer
↓
┌───────────────────────┐
│ Routing Decision │
│ │
│ Header: x-beta=true │──→ v2 (beta users)
│ Cookie: ab_group=B │──→ v2 (test group)
│ IP: US-WEST │──→ v2 (regional rollout)
│ Default │──→ v1 (everyone else)
└───────────────────────┘
Istio A/B Routing
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- myapp.com
http:
# Route beta users to v2
- match:
- headers:
x-beta-user:
exact: "true"
route:
- destination:
host: myapp
subset: v2
# Route users with test cookie to v2
- match:
- headers:
cookie:
regex: ".*ab_group=B.*"
route:
- destination:
host: myapp
subset: v2
# Everyone else gets v1
- route:
- destination:
host: myapp
subset: v1Strategy 6: Shadow Deployment (Dark Launch)
Deploy the new version alongside production and send a copy of real traffic to it — but don't return the new version's responses to users. Only compare results.
How It Works
User Request
↓
Load Balancer
↓
┌────────────────┐
│ v1 (Live) │──→ Response to user ✅
│ │
│ v2 (Shadow) │──→ Response discarded (only logged)
│ │ Compare: Did v2 give same result?
└────────────────┘
When to Use Shadow Deployments
✅ Testing performance of a rewritten service ✅ Validating a database migration with real queries ✅ Testing a new ML model with real data ❌ Write operations (would double side effects!)
Istio Shadow/Mirror Traffic
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- myapp
http:
- route:
- destination:
host: myapp
subset: v1
mirror:
host: myapp
subset: v2
mirrorPercentage:
value: 100.0 # Mirror 100% of trafficFeature Flags: Decouple Deploy from Release
Feature flags let you deploy code to production but control whether users see it. This is the most flexible approach.
How Feature Flags Work
// Code deployed to production — but feature hidden
if (featureFlags.isEnabled('new-checkout')) {
renderNewCheckout(); // Only for enabled users
} else {
renderOldCheckout(); // Everyone else
}Rollout Timeline
Day 1: Deploy code with flag OFF (0% of users see it)
Day 1: Internal testing — enable for employees only
Day 2: Enable for 1% of users → monitor
Day 3: Enable for 10% of users → monitor
Day 5: Enable for 50% of users → monitor
Day 7: Enable for 100% → monitor
Day 14: Remove flag, clean up old code
Feature Flag Tools
| Tool | Type | Free Tier |
|---|---|---|
| LaunchDarkly | SaaS | Limited |
| Split.io | SaaS | Free for dev |
| Flagsmith | Open-source | Self-hosted free |
| Unleash | Open-source | Self-hosted free |
| PostHog | Open-source | Generous free |
| ConfigCat | SaaS | Free tier |
Combining Feature Flags with Canary
Step 1: Canary deploy (code deploys to 5% of pods)
Step 2: Feature flag OFF (new feature invisible to all users)
Step 3: Flag ON for internal users (QA testing in production)
Step 4: Flag ON for 5% of external users (A/B test)
Step 5: Flag ON for 100% (full rollout)
Rollback options:
- Turn flag OFF (instant, no deploy needed)
- Rollback canary (revert code)
Comparison Matrix
| Strategy | Downtime | Rollback Speed | Risk | Cost | Complexity |
|---|---|---|---|---|---|
| Recreate | Yes ❌ | Slow (redeploy) | High | Low | Very Low |
| Rolling | No ✅ | Fast (undo) | Medium | Low | Low |
| Blue-Green | No ✅ | Instant ⚡ | Low | High (2x) | Medium |
| Canary | No ✅ | Fast | Very Low | Medium | High |
| A/B Testing | No ✅ | Instant ⚡ | Very Low | Medium | High |
| Shadow | No ✅ | N/A | None | High (2x) | Very High |
| Feature Flags | No ✅ | Instant ⚡ | Very Low | Low | Medium |
Decision Tree: Which Strategy Should You Use?
Start here
│
├─ Can you afford downtime?
│ └─ Yes → Recreate (simplest)
│
├─ Need zero downtime, low complexity?
│ └─ Rolling Update (K8s default)
│
├─ Need instant rollback?
│ └─ Blue-Green (2x infra cost)
│
├─ High-traffic, risk-averse?
│ └─ Canary (gradual rollout)
│
├─ Need user-specific targeting?
│ └─ A/B Testing (header/cookie routing)
│
├─ Testing a complete rewrite?
│ └─ Shadow (compare real traffic)
│
└─ Want maximum flexibility?
└─ Feature Flags + Rolling/Canary
Recommended by Company Stage
| Stage | Recommended Strategy | Why |
|---|---|---|
| Startup (< 50 users) | Rolling Update | Simple, good enough |
| Growing (< 10K users) | Rolling + Feature Flags | Balance speed and safety |
| Scaling (< 1M users) | Canary + Feature Flags | Gradual rollout, metrics-driven |
| Enterprise (1M+ users) | Blue-Green + Canary + Feature Flags | Maximum control |