High-Level Architecture
Every Kubernetes cluster consists of two types of resources:
┌─────────────────────────────────────────────┐
│ CONTROL PLANE (Master) │
│ (Manages cluster, makes scheduling decisions)
│ │
│ ┌──────────────┐ ┌──────────────────┐ │
│ │ API Server │ │ etcd │ │
│ │ (front-door) │ │ (cluster DB) │ │
│ └──────────────┘ └──────────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────────┐ │
│ │ Scheduler │ │ Controller │ │
│ │ (assign pods)│ │ Manager │ │
│ └──────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────┘
│
│ (communicates with)
▼
┌─────────────────────────────────────────────┐
│ WORKER NODES (Machines) │
│ (Run your actual containers) │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Node 1 │ │ Node 2 │ │
│ │ │ │ │ │
│ │ ┌───┐ ┌───┐ │ │ ┌───┐ ┌───┐ │ │
│ │ │Pod│ │Pod│ │ │ │Pod│ │Pod│ │ │
│ │ └───┘ └───┘ │ │ └───┘ └───┘ │ │
│ │ │ │ │ │
│ │ kubelet │ │ kubelet │ │
│ │ kube-proxy │ │ kube-proxy │ │
│ │ container-time │ │ container-time │ │
│ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────┘
The Control Plane (The Brain)
The Control Plane manages the cluster. It consists of:
1. API Server (kube-apiserver)
- Role: The front-door to Kubernetes. All communication goes through here.
- Function:
- Receives requests from
kubectland other clients - Validates requests
- Stores desired state in etcd
- Returns responses with cluster information
- Receives requests from
- Access: Exposes a RESTful API (on port 6443 by default)
- Example: When you run
kubectl apply -f deployment.yaml, you're talking to the API server
2. State Store (etcd)
- Role: The cluster's database. The single source of truth.
- Function:
- Stores all cluster data (desired state)
- Stores configuration, secrets, and runtime data
- Is the only stateful component in the control plane
- Uses Raft consensus algorithm for high availability
- Warning: Losing etcd data = losing the cluster state. Always back it up!
3. Scheduler (kube-scheduler)
- Role: Decides which node each Pod should run on.
- Decision Factors:
- Available CPU/memory on nodes
- Node selectors and affinity rules
- Pod resource requests
- Pod tolerations for node taints
- Custom scoring functions
- Example: When you deploy a Pod, the scheduler finds the best node to place it
4. Controller Manager (kube-controller-manager)
- Role: Runs various "controllers" that handle background tasks.
- Key Controllers:
- ReplicaSet Controller: Ensures desired number of Pod replicas are running
- Node Controller: Notices when a node goes down and evicts Pods
- Service Account Controller: Creates default service accounts
- Endpoints Controller: Keeps track of Pods for Services
- Function: Constantly watches for desired state mismatches and corrects them
5. Cloud Controller Manager (cloud-controller-manager)
- Role: Integrates with cloud providers (AWS, Google Cloud, Azure, etc.)
- Responsibilities:
- Create load balancers
- Manage storage volumes
- Handle node lifecycle
- Note: Only used in cloud deployments, not in on-premise or local clusters
Worker Nodes (The Muscles)
Worker nodes are the machines that actually run your applications.
1. Kubelet
- Role: The node's agent. Ensures containers are running in Pods.
- Responsibility:
- Registers the node with the API server
- Reports node resource availability (CPU, memory, disk)
- Watches for Pod specs assigned to this node
- Starts/stops containers via the container runtime
- Reports container health and status
- Important: Kubelet is always running, even if the control plane is down
2. Kube-proxy
- Role: Manages networking on the node.
- Responsibility:
- Maintains network rules on the node
- Implements Services (load balancing across Pods)
- Routes traffic to the correct Pod IP
- Can use different modes: iptables, IPVS, or userspace
- Port Range: Services expose Pods on dynamic ports (30000-32767 for NodePort)
3. Container Runtime
- Role: Actually runs the containers.
- Options:
- containerd (most common, Docker's runtime)
- CRI-O (OpenShift's runtime)
- Docker (still supported but deprecated)
- Kata (for enhanced isolation)
- Interface: Kubelet communicates via the Container Runtime Interface (CRI)
4. Node Status
Each node broadcasts:
- CPU available: The amount of CPU the node can allocate
- Memory available: The amount of RAM the node can allocate
- Disk space: Space for logs, container image layers, etc.
- Network connectivity: Whether the node can reach other nodes
- Node conditions: Ready, Pressure (low resources), Unreachable, etc.
Control Plane High Availability
In production, the control plane should be replicated for redundancy:
┌────────────────────────────────────────────┐
│ LOAD BALANCER (443) │
└────────────────┬───────────────────────────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐
│ Master │ │ Master │ │ Master │
│ API │ │ API │ │ API │
│ Server │ │ Server │ │ Server │
└────────┘ └────────┘ └────────┘
│ │ │
└────────────┼────────────┘
│
┌────▼─────┐
│ etcd │
│ (cluster) │
└───────────┘
- Multiple API Servers handle requests
- Scheduler and Controller Manager run in active-passive mode
- etcd is replicated across 3 or 5 nodes for fault tolerance
Communication Flow: Deploying a Pod
Here's what happens when you deploy a Pod:
1. kubectl apply -f pod.yaml
│
▼
2. API Server validates and stores in etcd
│
▼
3. Scheduler watches for unscheduled Pods
│
▼
4. Scheduler finds best node, updates Pod with nodeName
│
▼
5. API Server updates Pod spec in etcd
│
▼
6. Kubelet on target node notices new Pod
│
▼
7. Kubelet contacts container runtime
│
▼
8. Container runtime creates and starts container
│
▼
9. Kubelet updates Pod status back to API Server
│
▼
10. kubectl get pod returns "Running"
Network Architecture
Pod-to-Pod Communication
- Every Pod gets a unique IP address (across the whole cluster, not just the node)
- Pods can communicate with other Pods on other nodes without translation
- Uses a Container Network Interface (CNI) plugin (Flannel, Weave, Calico, etc.)
Pod-to-Service Communication
- Services provide a stable DNS name and IP
- Traffic is load-balanced across Pods
- Service discovery happens via DNS (e.g.,
http://my-service:8080)
External Communication
- Services with type
LoadBalancerget an external IP - Services with type
NodePortare accessible on ports 30000-32767 on every node - Ingress resources provide HTTP/HTTPS routing
Resource Management
Pod Resources
Every Pod consumes:
- CPU: Measured in millicores (1000m = 1 CPU core)
- Memory: Measured in bytes (1Gi = 1 gigabyte)
- Requests: Minimum guaranteed resources
- Limits: Maximum resources a Pod can use
Node Capacity
Every node has:
- Allocatable CPU: Total CPU - reserved for OS
- Allocatable Memory: Total Memory - reserved for OS
- Max Pods: Typically 110 (configurable)
The Reconciliation Loop
Kubernetes operates on a constant reconciliation loop:
Desired State (in etcd)
│
▼
Compare with Current State
│
┌───┴───┐
│ │
No Same? Yes
│ ▼
▼ Wait for next check
Execute Actions
│
▼
Update Current State
This happens continuously, ensuring the cluster always matches your desired configuration.
Common Deployment Topologies
Single Master (Development)
┌─────────┐
│ Master │
└─────────┘
│
├───────┬───────┬───────┐
▼ ▼ ▼ ▼
Node1 Node2 Node3 Node4
Multi-Master (Production)
┌─────────────────────┐
│ Load Balancer │
└──────────┬──────────┘
│
┌──────────┼──────────┐
▼ ▼ ▼
Master1 Master2 Master3 (with etcd cluster)
│ │ │
└──────────┼──────────┘
│
┌──────────┴──────────┬───────────┐
▼ ▼ ▼ ▼
Node1 Node2 Node3 Node4+