G
GuideDevOps
Lesson 17 of 17

Kubernetes Best Practices

Part of the Kubernetes tutorial series.

Security

🔒 Run Containers as Non-Root

Bad:

containers:
- name: app
  image: myapp:latest
  # Runs as root by default!

Good:

containers:
- name: app
  image: myapp:latest
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    allowPrivilegeEscalation: false
    capabilities:
      drop:
      - ALL

🔒 Use Read-Only Root Filesystem

securityContext:
  readOnlyRootFilesystem: true
volumeMounts:
- name: tmp
  mountPath: /tmp
volumes:
- name: tmp
  emptyDir: {}

🔒 Network Policies

Default deny all traffic, then explicitly allow:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
 
---
 
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend
spec:
  podSelector:
    matchLabels:
      tier: backend
  ingress:
  - from:
    - podSelector:
        matchLabels:
          tier: frontend

🔒 Pod Security Policy

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: restricted
spec:
  privileged: false
  allowPrivilegeEscalation: false
  requiredDropCapabilities:
  - ALL
  volumes:
  - 'configMap'
  - 'emptyDir'
  - 'projected'
  - 'secret'
  - 'downwardAPI'
  - 'persistentVolumeClaim'
  hostNetwork: false
  hostIPC: false
  hostPID: false
  runAsUser:
    rule: 'MustRunAsNonRoot'
  seLinux:
    rule: 'MustRunAs'
    seLinuxOptions:
      level: "s0:c1,c2"
readOnlyRootFilesystem: false

Resource Management

📊 Always Set Resource Requests and Limits

Bad:

containers:
- name: app
  image: myapp:latest
  # No limits - can consume all node resources!

Good:

containers:
- name: app
  image: myapp:latest
  resources:
    requests:
      cpu: 250m
      memory: 256Mi
    limits:
      cpu: 500m
      memory: 512Mi

📊 Use Resource Quotas

apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
  namespace: production
spec:
  hard:
    requests.cpu: "10"
    requests.memory: "20Gi"
    limits.cpu: "20"
    limits.memory: "40Gi"
    pods: "100"
    services.loadbalancers: "2"

📊 Monitor Resource Usage

# Install metrics-server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
 
# Check usage
kubectl top nodes
kubectl top pods
kubectl top pods --all-namespaces

Health and Reliability

💚 Use Health Probes

Bad:

containers:
- name: app
  image: myapp:latest
  # Kubernetes assumes app is ready immediately

Good:

containers:
- name: app
  image: myapp:latest
  livenessProbe:
    httpGet:
      path: /health
      port: 8080
    initialDelaySeconds: 30
    periodSeconds: 10
    timeoutSeconds: 5
    failureThreshold: 3
  
  readinessProbe:
    httpGet:
      path: /ready
      port: 8080
    initialDelaySeconds: 10
    periodSeconds: 5
    timeoutSeconds: 3
    failureThreshold: 2

💚 Graceful Shutdown

spec:
  terminationGracePeriodSeconds: 30
  containers:
  - name: app
    lifecycle:
      preStop:
        exec:
          command: ["/app/graceful-shutdown.sh"]

💚 Pod Disruption Budgets

Ensure availability during maintenance:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: app-pdb
spec:
  minAvailable: 2  # Always keep at least 2 running
  selector:
    matchLabels:
      app: myapp

Container Images

📦 Use Small Base Images

Bad:

FROM ubuntu:22.04
RUN apt-get update && apt-get install -y python3
COPY app.py .
CMD ["python3", "app.py"]
# Image size: 500MB+

Good:

FROM python:3.11-slim
COPY app.py .
CMD ["python3", "app.py"]
# Image size: 150MB

Best (distroless):

FROM python:3.11 as builder
COPY app.py .
RUN pycompile app.py
 
FROM gcr.io/distroless/python3-nonroot
COPY --from=builder app.py .
CMD ["app.py"]
# Image size: 50MB

📦 Use Specific Tags

Bad:

image: myapp:latest  # Ambiguous, hard to track

Good:

image: myapp:v1.2.3  # Specific, reproducible
image: myapp:sha256:abc123...  # Even better

📦 Image Pull Policy

containers:
- name: app
  image: myapp:latest
  imagePullPolicy: Always  # Always pull latest
  # or:
  imagePullPolicy: IfNotPresent  # Use cached if available

Configuration Management

🔧 Use ConfigMaps for Configuration

Bad:

env:
- name: DATABASE_HOST
  value: "db.example.com"
- name: LOG_LEVEL
  value: "info"
# Hard to change, must rebuild image

Good:

envFrom:
- configMapRef:
    name: app-config
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  DATABASE_HOST: "db.example.com"
  LOG_LEVEL: "info"

🔧 Use Secrets for Sensitive Data

env:
- name: DATABASE_PASSWORD
  valueFrom:
    secretKeyRef:
      name: db-secret
      key: password

Labeling and Organization

🏷️ Use Consistent Labels

labels:
  app: myapp          # Application name
  version: v1.2.3     # Application version
  environment: prod   # Environment
  team: platform      # Team
  tier: backend       # Service tier

🏷️ Use Label Selectors

# Find all production backend pods
kubectl get pods -l environment=prod,tier=backend
 
# Delete all dev resources
kubectl delete pods -l environment=dev

🏷️ Use Annotations for Metadata

annotations:
  description: "My awesome application"
  documentation: "https://docs.example.com"
  repository: "https://github.com/myorg/myapp"
  contact: "platform@example.com"
  last-updated: "2024-01-15T10:30:00Z"

Deployment Patterns

🚀 Use Declarative Deployments

Bad (imperative):

kubectl create deployment myapp --image=myapp:1.0
kubectl scale deployment myapp --replicas=5
kubectl set image deployment/myapp myapp=myapp:2.0

Good (declarative):

kubectl apply -f deployment.yaml
# Edit deployment.yaml
kubectl apply -f deployment.yaml

🚀 Use GitOps

Store all manifests in Git, deploy via GitOps tool:

Repository
  ├─ base/
  ├─ overlays/
  └─ .gitignore
    └─ sealed-secrets

ArgoCD / Flux
  ↑
  └─ Watches repository
      Auto-syncs changes
      Audit trail in Git

🚀 Implement Progressive Delivery

Canary:

# 90% traffic to stable, 10% to canary
kubectl apply -f stable-deployment.yaml
kubectl apply -f canary-deployment.yaml
# Both have same Service label

Blue-Green:

# Run both versions, switch when ready
kubectl apply -f blue-deployment.yaml
kubectl apply -f green-deployment.yaml
# Switch Service selector to green
kubectl patch svc myapp -p '{"spec":{"selector":{"version":"green"}}}'

Monitoring and Observability

📈 Instrument Your Application

from prometheus_client import Counter, Histogram
 
request_count = Counter('app_requests_total', 'Total requests')
request_duration = Histogram('app_request_duration_seconds', 'Request duration')
 
@app.route('/')
@request_duration.time()
def index():
    request_count.inc()
    return 'Hello'

📈 Use Structured Logging

import logging
import json
 
logger = logging.getLogger()
 
log_data = {
    'level': 'info',
    'timestamp': datetime.now().isoformat(),
    'user_id': 123,
    'action': 'login',
    'duration_ms': 145
}
logger.info(json.dumps(log_data))

📈 Enable Auditing

apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
  omitStages:
  - RequestReceived

Common Mistakes to Avoid

Don't skip resource limits

  • One pod can take down entire node

Don't use latest tag in production

  • Use specific versions for reproducibility

Don't hardcode configuration

  • Use ConfigMaps and Secrets

Don't run as root

  • Security risk

Don't skip health checks

  • Kubernetes won't know when to restart

Don't forget namespace isolation

  • Use RBAC, Network Policies, Resource Quotas

Don't store secrets in Git

  • Use sealed-secrets or external management