Security
🔒 Run Containers as Non-Root
Bad:
containers:
- name: app
image: myapp:latest
# Runs as root by default!Good:
containers:
- name: app
image: myapp:latest
securityContext:
runAsNonRoot: true
runAsUser: 1000
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL🔒 Use Read-Only Root Filesystem
securityContext:
readOnlyRootFilesystem: true
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir: {}🔒 Network Policies
Default deny all traffic, then explicitly allow:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend
spec:
podSelector:
matchLabels:
tier: backend
ingress:
- from:
- podSelector:
matchLabels:
tier: frontend🔒 Pod Security Policy
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: restricted
spec:
privileged: false
allowPrivilegeEscalation: false
requiredDropCapabilities:
- ALL
volumes:
- 'configMap'
- 'emptyDir'
- 'projected'
- 'secret'
- 'downwardAPI'
- 'persistentVolumeClaim'
hostNetwork: false
hostIPC: false
hostPID: false
runAsUser:
rule: 'MustRunAsNonRoot'
seLinux:
rule: 'MustRunAs'
seLinuxOptions:
level: "s0:c1,c2"
readOnlyRootFilesystem: falseResource Management
📊 Always Set Resource Requests and Limits
Bad:
containers:
- name: app
image: myapp:latest
# No limits - can consume all node resources!Good:
containers:
- name: app
image: myapp:latest
resources:
requests:
cpu: 250m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi📊 Use Resource Quotas
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
namespace: production
spec:
hard:
requests.cpu: "10"
requests.memory: "20Gi"
limits.cpu: "20"
limits.memory: "40Gi"
pods: "100"
services.loadbalancers: "2"📊 Monitor Resource Usage
# Install metrics-server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Check usage
kubectl top nodes
kubectl top pods
kubectl top pods --all-namespacesHealth and Reliability
💚 Use Health Probes
Bad:
containers:
- name: app
image: myapp:latest
# Kubernetes assumes app is ready immediatelyGood:
containers:
- name: app
image: myapp:latest
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 2💚 Graceful Shutdown
spec:
terminationGracePeriodSeconds: 30
containers:
- name: app
lifecycle:
preStop:
exec:
command: ["/app/graceful-shutdown.sh"]💚 Pod Disruption Budgets
Ensure availability during maintenance:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: app-pdb
spec:
minAvailable: 2 # Always keep at least 2 running
selector:
matchLabels:
app: myappContainer Images
📦 Use Small Base Images
Bad:
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y python3
COPY app.py .
CMD ["python3", "app.py"]
# Image size: 500MB+Good:
FROM python:3.11-slim
COPY app.py .
CMD ["python3", "app.py"]
# Image size: 150MBBest (distroless):
FROM python:3.11 as builder
COPY app.py .
RUN pycompile app.py
FROM gcr.io/distroless/python3-nonroot
COPY --from=builder app.py .
CMD ["app.py"]
# Image size: 50MB📦 Use Specific Tags
Bad:
image: myapp:latest # Ambiguous, hard to trackGood:
image: myapp:v1.2.3 # Specific, reproducible
image: myapp:sha256:abc123... # Even better📦 Image Pull Policy
containers:
- name: app
image: myapp:latest
imagePullPolicy: Always # Always pull latest
# or:
imagePullPolicy: IfNotPresent # Use cached if availableConfiguration Management
🔧 Use ConfigMaps for Configuration
Bad:
env:
- name: DATABASE_HOST
value: "db.example.com"
- name: LOG_LEVEL
value: "info"
# Hard to change, must rebuild imageGood:
envFrom:
- configMapRef:
name: app-configapiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
DATABASE_HOST: "db.example.com"
LOG_LEVEL: "info"🔧 Use Secrets for Sensitive Data
env:
- name: DATABASE_PASSWORD
valueFrom:
secretKeyRef:
name: db-secret
key: passwordLabeling and Organization
🏷️ Use Consistent Labels
labels:
app: myapp # Application name
version: v1.2.3 # Application version
environment: prod # Environment
team: platform # Team
tier: backend # Service tier🏷️ Use Label Selectors
# Find all production backend pods
kubectl get pods -l environment=prod,tier=backend
# Delete all dev resources
kubectl delete pods -l environment=dev🏷️ Use Annotations for Metadata
annotations:
description: "My awesome application"
documentation: "https://docs.example.com"
repository: "https://github.com/myorg/myapp"
contact: "platform@example.com"
last-updated: "2024-01-15T10:30:00Z"Deployment Patterns
🚀 Use Declarative Deployments
Bad (imperative):
kubectl create deployment myapp --image=myapp:1.0
kubectl scale deployment myapp --replicas=5
kubectl set image deployment/myapp myapp=myapp:2.0Good (declarative):
kubectl apply -f deployment.yaml
# Edit deployment.yaml
kubectl apply -f deployment.yaml🚀 Use GitOps
Store all manifests in Git, deploy via GitOps tool:
Repository
├─ base/
├─ overlays/
└─ .gitignore
└─ sealed-secrets
ArgoCD / Flux
↑
└─ Watches repository
Auto-syncs changes
Audit trail in Git
🚀 Implement Progressive Delivery
Canary:
# 90% traffic to stable, 10% to canary
kubectl apply -f stable-deployment.yaml
kubectl apply -f canary-deployment.yaml
# Both have same Service labelBlue-Green:
# Run both versions, switch when ready
kubectl apply -f blue-deployment.yaml
kubectl apply -f green-deployment.yaml
# Switch Service selector to green
kubectl patch svc myapp -p '{"spec":{"selector":{"version":"green"}}}'Monitoring and Observability
📈 Instrument Your Application
from prometheus_client import Counter, Histogram
request_count = Counter('app_requests_total', 'Total requests')
request_duration = Histogram('app_request_duration_seconds', 'Request duration')
@app.route('/')
@request_duration.time()
def index():
request_count.inc()
return 'Hello'📈 Use Structured Logging
import logging
import json
logger = logging.getLogger()
log_data = {
'level': 'info',
'timestamp': datetime.now().isoformat(),
'user_id': 123,
'action': 'login',
'duration_ms': 145
}
logger.info(json.dumps(log_data))📈 Enable Auditing
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
omitStages:
- RequestReceivedCommon Mistakes to Avoid
❌ Don't skip resource limits
- One pod can take down entire node
❌ Don't use latest tag in production
- Use specific versions for reproducibility
❌ Don't hardcode configuration
- Use ConfigMaps and Secrets
❌ Don't run as root
- Security risk
❌ Don't skip health checks
- Kubernetes won't know when to restart
❌ Don't forget namespace isolation
- Use RBAC, Network Policies, Resource Quotas
❌ Don't store secrets in Git
- Use sealed-secrets or external management