G
GuideDevOps
Lesson 4 of 14

Prerequisites

Part of the Chaos Engineering tutorial series.

Prerequisites

Before starting with Chaos Engineering, ensure you have foundational knowledge and tools in place.

Required Knowledge

1. Infrastructure Basics

  • Understand how servers, networks, and storage work
  • Familiar with Linux/Unix concepts (processes, network interfaces, file systems)
  • Know what a load balancer does
  • Understand database replication

Quick Self-Check:

  • Can you explain what happens when a server crashes? ✓
  • Do you know how traffic is routed to healthy servers? ✓
  • Understand what latency and packet loss mean? ✓

2. Application Architecture

  • Know the difference between monoliths and microservices
  • Understand service-to-service communication (REST, gRPC, messaging)
  • Familiar with health checks and load balancer integration
  • Know what circuit breakers and retries do

Quick Self-Check:

  • Can you draw your system's architecture? ✓
  • Can you explain how your services depend on each other? ✓
  • Know what happens when one service is slow? ✓

3. Monitoring and Observability

  • Can read and interpret metrics (latency, throughput, error rate)
  • Familiar with logging and log parsing
  • Understand distributed tracing concepts
  • Can query monitoring systems (Prometheus, Datadog, CloudWatch, etc.)

Quick Self-Check:

  • Can you find error spikes in your monitoring dashboard? ✓
  • Can you correlate metrics across services? ✓
  • Know where to look when something breaks? ✓

4. Containerization and Kubernetes (if using K8s-based tools)

  • Understand how containers work
  • Familiar with Kubernetes concepts (pods, services, deployments)
  • Know how to run kubectl commands
  • Understand persistent volumes and ConfigMaps

Quick Self-Check:

  • Can you deploy an application to Kubernetes? ✓
  • Can you kill a pod and see it restart? ✓
  • Understand what a NodePort service does? ✓

Required Tools Installation

1. Container Runtime

Docker (for local testing and accessing tools):

# macOS
brew install docker
 
# Linux (Ubuntu/Debian)
sudo apt-get install docker.io
 
# Linux (RHEL/CentOS)
sudo yum install docker
 
# Verify
docker --version
docker run hello-world

2. Kubernetes (for labs)

Option A: Minikube (local, single-node cluster):

# Install
brew install minikube  # macOS
apt-get install minikube  # Linux
 
# Start cluster
minikube start --cpus=4 --memory=8192
 
# Verify
kubectl cluster-info
kubectl get nodes

Option B: Kind (lightweight, Docker-based):

# Install
brew install kind  # macOS
 
# Create cluster
kind create cluster --name chaos-lab
 
# Verify
kubectl cluster-info --context kind-chaos-lab

Option C: Cloud Kubernetes (AWS EKS, Azure AKS, GCP GKE):

# AWS EKS with eksctl
eksctl create cluster --name chaos-lab --nodes=3
 
# Verify
kubectl get nodes
kubectl get services -n default

3. Kubernetes Tools

# kubectl - Kubernetes CLI
# Usually comes with cluster installation
 
# Helm - Package manager for Kubernetes
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
 
# Verify
kubectl version
helm version

4. Monitoring Stack

Option A: Prometheus + Grafana (local):

# Using docker-compose
cat > docker-compose.yml << 'EOF'
version: '3.8'
services:
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
 
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
EOF
 
docker-compose up -d

Option B: Cloud Monitoring (Datadog, New Relic, CloudWatch):

# Datadog agent example
docker run -d \
  -e DD_AGENT_MAJOR_VERSION=7 \
  -e DD_API_KEY=<YOUR_API_KEY> \
  -e DD_SITE=datadoghq.com \
  datadog/agent:latest

5. Text Editor / IDE

# VS Code
brew install visual-studio-code  # macOS
 
# Or any editor you prefer (Vim, Neovim, Sublime, JetBrains, etc.)

Optional but Recommended

# htop - interactive process monitor
brew install htop  # macOS
apt-get install htop  # Linux
 
# curl - for testing APIs
brew install curl
 
# jq - JSON parsing
brew install jq
 
# yq - YAML parsing
brew install yq
 
# git - for version control
git --version

Environment Setup

1. Create a Lab Project

# Create directory structure
mkdir -p ~/chaos-engineering-lab
cd ~/chaos-engineering-lab
 
mkdir -p {experiments, manifests, monitoring, scripts}

2. Initialize Git Repository

git init
git config user.name "Your Name"
git config user.email "your@email.com"
 
# Create basic README
cat > README.md << 'EOF'
# Chaos Engineering Lab
 
## Structure
- experiments/: Chaos experiment definitions
- manifests/: Kubernetes manifests for test application
- monitoring/: Monitoring configuration
- scripts/: Helper scripts
 
## Quick Start
1. Start minikube: `minikube start`
2. Deploy test app: `kubectl apply -f manifests/`
3. Run experiment: `./scripts/run-experiment.sh`
EOF
 
git add README.md
git commit -m "Initial commit"

3. Deploy a Test Application

Create a simple application to run chaos tests against:

# manifests/test-app.yaml
cat > manifests/test-app.yaml << 'EOF'
apiVersion: v1
kind: Namespace
metadata:
  name: chaos-testing
 
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
  namespace: chaos-testing
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: app
        image: nginx:latest
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 256Mi
        livenessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 10
          periodSeconds: 5
        readinessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 5
 
---
apiVersion: v1
kind: Service
metadata:
  name: web-app
  namespace: chaos-testing
spec:
  selector:
    app: web-app
  ports:
  - port: 80
    targetPort: 80
  type: LoadBalancer
EOF
 
# Deploy it
kubectl apply -f manifests/test-app.yaml
 
# Verify
kubectl get pods -n chaos-testing
kubectl get service web-app -n chaos-testing

4. Install a Chaos Engineering Tool

For Kubernetes: Install Litmus Chaos

# Add namespace label for pod security
kubectl label namespace chaos-testing pod-security.kubernetes.io/enforce=baseline
 
# Install Litmus (if using Kubernetes 1.25+)
helm repo add litmuschaos https://litmuschaos.github.io/litmus-helm/
helm install litmus litmuschaos/litmus \
  --namespace litmus \
  --create-namespace \
  --set adminUser.name=admin \
  --set adminUser.password=litmus
 
# Verify installation
kubectl get pods -n litmus

For VMs/Servers: Install Gremlin

# On Linux
curl -O https://downloads.gremlin.com/gremlin/downloads/client/latest/linux/gremlin-latest.linux_amd64.rpm
sudo rpm -i gremlin-latest.linux_amd64.rpm
 
# Authenticate
sudo gremlin config set -c <TEAM_ID> -p <PRIVATE_KEY>
 
# Start
sudo systemctl start gremlin
gremlin check

First Experiment

Simple Test: Kill a Container

#!/bin/bash
# scripts/first-experiment.sh
 
echo "=== First Chaos Experiment: Pod Deletion ==="
 
# Baseline: Check how many pods are running
echo "Initial pod count:"
kubectl get pods -n chaos-testing
 
# Run first experiment
echo "Deleting one pod..."
kubectl delete pod -n chaos-testing \
  $(kubectl get pods -n chaos-testing -l app=web-app -o jsonpath='{.items[0].metadata.name}')
 
# Observe immediate recovery
echo "Checking pod status after 5 seconds..."
sleep 5
kubectl get pods -n chaos-testing
 
# Count ready pods
READY=$(kubectl get pods -n chaos-testing -l app=web-app -o jsonpath='{.items[?(@.status.conditions[?(@.type=="Ready")].status=="True")].metadata.name}' | wc -w)
echo "Ready pods: $READY/3"
 
if [ "$READY" -eq 3 ]; then
  echo "✓ PASS: System recovered automatically"
else
  echo "✗ FAIL: Expected 3 pods, found $READY"
fi

Run it:

chmod +x scripts/first-experiment.sh
./scripts/first-experiment.sh

Expected Output

=== First Chaos Experiment: Pod Deletion ===
Initial pod count:
NAME                       READY   STATUS    RESTARTS   AGE
web-app-85d98d8c68-4kqm5   1/1     Running   0          5m
web-app-85d98d8c68-bx2jk   1/1     Running   0          5m
web-app-85d98d8c68-mnkl9   1/1     Running   0          5m

Deleting one pod...
pod "web-app-85d98d8c68-4kqm5" deleted

Checking pod status after 5 seconds...
NAME                       READY   STATUS    RESTARTS   AGE
web-app-85d98d8c68-bx2jk   1/1     Running   0          5m
web-app-85d98d8c68-mnkl9   1/1     Running   0          5m
web-app-85d98d8c68-p9nml2  1/1     Running   0          3s

Ready pods: 3/3
✓ PASS: System recovered automatically

Verification Checklist

Before moving to main tutorials, verify you can:

  • Start a Kubernetes cluster
  • Deploy an application
  • Check pod status with kubectl
  • Access monitoring dashboard
  • Trigger a simple chaos experiment
  • Observe system recovery
  • Read experiment results

Troubleshooting

Minikube won't start

minikube delete
minikube start --cpus=4 --memory=8192

Docker not running

# macOS
open /Applications/Docker.app
 
# Linux
sudo systemctl start docker

kubectl connection refused

# Reset kubeconfig
rm ~/.kube/config
minikube start  # or re-authenticate with your cluster

Metrics not showing

# Install metrics-server for Kubernetes
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
 
# Verify
kubectl top nodes

Next Steps

Now that your environment is set up:

  1. Read Foundations → "Introduction to Chaos Engineering"
  2. Learn Principles → "Principles of Chaos"
  3. Understand Benefits → "Why Chaos Engineering Matters"
  4. Run Your First Test → Follow tool-specific tutorials
  5. Design Experiments → Use the design methodology