Kubernetes Architecture

High-Level Architecture

Every Kubernetes cluster consists of two types of resources:

┌─────────────────────────────────────────────┐
│         CONTROL PLANE (Master)              │
│  (Manages cluster, makes scheduling decisions)
│                                             │
│  ┌──────────────┐  ┌──────────────────┐   │
│  │  API Server  │  │      etcd        │   │
│  │ (front-door) │  │  (cluster DB)    │   │
│  └──────────────┘  └──────────────────┘   │
│                                             │
│  ┌──────────────┐  ┌──────────────────┐   │
│  │  Scheduler   │  │   Controller     │   │
│  │ (assign pods)│  │   Manager        │   │
│  └──────────────┘  └──────────────────┘   │
└─────────────────────────────────────────────┘
              │
              │ (communicates with)
              ▼
┌─────────────────────────────────────────────┐
│        WORKER NODES (Machines)              │
│     (Run your actual containers)            │
│                                             │
│  ┌─────────────────┐  ┌─────────────────┐ │
│  │    Node 1       │  │    Node 2       │ │
│  │                 │  │                 │ │
│  │  ┌───┐ ┌───┐   │  │  ┌───┐ ┌───┐   │ │
│  │  │Pod│ │Pod│   │  │  │Pod│ │Pod│   │ │
│  │  └───┘ └───┘   │  │  └───┘ └───┘   │ │
│  │                 │  │                 │ │
│  │ kubelet         │  │ kubelet         │ │
│  │ kube-proxy      │  │ kube-proxy      │ │
│  │ container-time  │  │ container-time  │ │
│  └─────────────────┘  └─────────────────┘ │
└─────────────────────────────────────────────┘

The Control Plane (The Brain)

The Control Plane manages the cluster. It consists of:

1. API Server (`kube-apiserver`)

Role: The front-door to Kubernetes. All communication goes through here.
Function:
- Receives requests from kubectl and other clients
- Validates requests
- Stores desired state in etcd
- Returns responses with cluster information
Access: Exposes a RESTful API (on port 6443 by default)
Example: When you run kubectl apply -f deployment.yaml, you're talking to the API server

2. State Store (`etcd`)

Role: The cluster's database. The single source of truth.
Function:
- Stores all cluster data (desired state)
- Stores configuration, secrets, and runtime data
- Is the only stateful component in the control plane
- Uses Raft consensus algorithm for high availability
Warning: Losing etcd data = losing the cluster state. Always back it up!

3. Scheduler (`kube-scheduler`)

Role: Decides which node each Pod should run on.
Decision Factors:
- Available CPU/memory on nodes
- Node selectors and affinity rules
- Pod resource requests
- Pod tolerations for node taints
- Custom scoring functions
Example: When you deploy a Pod, the scheduler finds the best node to place it

4. Controller Manager (`kube-controller-manager`)

Role: Runs various "controllers" that handle background tasks.
Key Controllers:
- ReplicaSet Controller: Ensures desired number of Pod replicas are running
- Node Controller: Notices when a node goes down and evicts Pods
- Service Account Controller: Creates default service accounts
- Endpoints Controller: Keeps track of Pods for Services
Function: Constantly watches for desired state mismatches and corrects them

5. Cloud Controller Manager (`cloud-controller-manager`)

Role: Integrates with cloud providers (AWS, Google Cloud, Azure, etc.)
Responsibilities:
- Create load balancers
- Manage storage volumes
- Handle node lifecycle
Note: Only used in cloud deployments, not in on-premise or local clusters

Worker Nodes (The Muscles)

Worker nodes are the machines that actually run your applications.

1. Kubelet

Role: The node's agent. Ensures containers are running in Pods.
Responsibility:
- Registers the node with the API server
- Reports node resource availability (CPU, memory, disk)
- Watches for Pod specs assigned to this node
- Starts/stops containers via the container runtime
- Reports container health and status
Important: Kubelet is always running, even if the control plane is down

2. Kube-proxy

Role: Manages networking on the node.
Responsibility:
- Maintains network rules on the node
- Implements Services (load balancing across Pods)
- Routes traffic to the correct Pod IP
- Can use different modes: iptables, IPVS, or userspace
Port Range: Services expose Pods on dynamic ports (30000-32767 for NodePort)

3. Container Runtime

Role: Actually runs the containers.
Options:
- containerd (most common, Docker's runtime)
- CRI-O (OpenShift's runtime)
- Docker (still supported but deprecated)
- Kata (for enhanced isolation)
Interface: Kubelet communicates via the Container Runtime Interface (CRI)

4. Node Status

Each node broadcasts:

CPU available: The amount of CPU the node can allocate
Memory available: The amount of RAM the node can allocate
Disk space: Space for logs, container image layers, etc.
Network connectivity: Whether the node can reach other nodes
Node conditions: Ready, Pressure (low resources), Unreachable, etc.

Control Plane High Availability

In production, the control plane should be replicated for redundancy:

┌────────────────────────────────────────────┐
│     LOAD BALANCER (443)                    │
└────────────────┬───────────────────────────┘
                 │
    ┌────────────┼────────────┐
    ▼            ▼            ▼
┌────────┐  ┌────────┐  ┌────────┐
│ Master │  │ Master │  │ Master │
│  API   │  │  API   │  │  API   │
│ Server │  │ Server │  │ Server │
└────────┘  └────────┘  └────────┘
    │            │            │
    └────────────┼────────────┘
                 │
            ┌────▼─────┐
            │  etcd     │
            │ (cluster) │
            └───────────┘

Multiple API Servers handle requests
Scheduler and Controller Manager run in active-passive mode
etcd is replicated across 3 or 5 nodes for fault tolerance

Communication Flow: Deploying a Pod

Here's what happens when you deploy a Pod:

1. kubectl apply -f pod.yaml
   │
   ▼
2. API Server validates and stores in etcd
   │
   ▼
3. Scheduler watches for unscheduled Pods
   │
   ▼
4. Scheduler finds best node, updates Pod with nodeName
   │
   ▼
5. API Server updates Pod spec in etcd
   │
   ▼
6. Kubelet on target node notices new Pod
   │
   ▼
7. Kubelet contacts container runtime
   │
   ▼
8. Container runtime creates and starts container
   │
   ▼
9. Kubelet updates Pod status back to API Server
   │
   ▼
10. kubectl get pod returns "Running"

Network Architecture

Pod-to-Pod Communication

Every Pod gets a unique IP address (across the whole cluster, not just the node)
Pods can communicate with other Pods on other nodes without translation
Uses a Container Network Interface (CNI) plugin (Flannel, Weave, Calico, etc.)

Pod-to-Service Communication

Services provide a stable DNS name and IP
Traffic is load-balanced across Pods
Service discovery happens via DNS (e.g., http://my-service:8080)

External Communication

Services with type LoadBalancer get an external IP
Services with type NodePort are accessible on ports 30000-32767 on every node
Ingress resources provide HTTP/HTTPS routing

Resource Management

Pod Resources

Every Pod consumes:

CPU: Measured in millicores (1000m = 1 CPU core)
Memory: Measured in bytes (1Gi = 1 gigabyte)
Requests: Minimum guaranteed resources
Limits: Maximum resources a Pod can use

Node Capacity

Every node has:

Allocatable CPU: Total CPU - reserved for OS
Allocatable Memory: Total Memory - reserved for OS
Max Pods: Typically 110 (configurable)

The Reconciliation Loop

Kubernetes operates on a constant reconciliation loop:

Desired State (in etcd)
        │
        ▼
Compare with Current State
        │
    ┌───┴───┐
    │       │
   No      Same?     Yes
    │             ▼
    ▼        Wait for next check
Execute Actions
    │
    ▼
Update Current State

This happens continuously, ensuring the cluster always matches your desired configuration.

Common Deployment Topologies

Single Master (Development)

┌─────────┐
│ Master  │
└─────────┘
    │
    ├───────┬───────┬───────┐
    ▼       ▼       ▼       ▼
  Node1   Node2   Node3   Node4

Multi-Master (Production)

    ┌─────────────────────┐
    │  Load Balancer      │
    └──────────┬──────────┘
               │
    ┌──────────┼──────────┐
    ▼          ▼          ▼
 Master1    Master2    Master3 (with etcd cluster)
    │          │          │
    └──────────┼──────────┘
               │
    ┌──────────┴──────────┬───────────┐
    ▼          ▼          ▼           ▼
  Node1      Node2      Node3      Node4+