Practical DevOps Scripts - Python for DevOps

This section combines everything you've learned into production-ready scripts for common DevOps tasks.

1. System Health Monitor

This script checks CPU, Memory, and Disk usage and prints an alert if thresholds are exceeded. It uses the psutil library.

Action:

import psutil
 
def check_system_health(cpu_threshold=80, mem_threshold=80):
    cpu_usage = psutil.cpu_percent(interval=1)
    mem_usage = psutil.virtual_memory().percent
    disk_usage = psutil.disk_usage('/').percent
 
    print(f"--- System Status ---")
    print(f"CPU:    {cpu_usage}%")
    print(f"Memory: {mem_usage}%")
    print(f"Disk:   {disk_usage}%")
 
    if cpu_usage > cpu_threshold:
        print("ALERT: High CPU Usage!")
    if mem_usage > mem_threshold:
        print("ALERT: High Memory Usage!")
 
if __name__ == "__main__":
    check_system_health()

Result:

--- System Status ---
CPU:    15.4%
Memory: 62.1%
Disk:   44.8%

2. Automated S3 Backup

This script compresses a directory and uploads it to an AWS S3 bucket using boto3.

Action:

import boto3
import tarfile
import os
from datetime import datetime
 
def backup_to_s3(source_dir, bucket_name):
    s3 = boto3.client('s3')
    date = datetime.now().strftime('%Y-%m-%d')
    archive_name = f"backup-{date}.tar.gz"
 
    # Create compressed archive
    with tarfile.open(archive_name, "w:gz") as tar:
        tar.add(source_dir, arcname=os.path.basename(source_dir))
    
    # Upload to S3
    print(f"Uploading {archive_name} to {bucket_name}...")
    s3.upload_file(archive_name, bucket_name, archive_name)
    print("Backup successful!")
 
# backup_to_s3('/var/www/html', 'my-devops-backups')

Result:

Uploading backup-2026-04-10.tar.gz to my-devops-backups...
Backup successful!

3. Log Error Extractor

This script parses a log file using Regular Expressions to find all lines containing "ERROR" or "CRITICAL".

Action:

import re
 
def extract_errors(log_file):
    error_pattern = re.compile(r".*(ERROR|CRITICAL).*", re.IGNORECASE)
    
    with open(log_file, 'r') as f:
        errors = [line.strip() for line in f if error_pattern.match(line)]
    
    print(f"Found {len(errors)} critical issues:")
    for err in errors[:3]: # Show first 3
        print(f"  - {err}")
 
# extract_errors('app.log')

Result:

Found 12 critical issues:
  - 2026-04-10 12:01:04 ERROR: Database connection failed
  - 2026-04-10 12:05:12 CRITICAL: Out of memory on node-01
  - 2026-04-10 12:10:00 ERROR: Authentication service unreachable

4. Kubernetes Pod Audit

A script using the kubernetes SDK to find pods that have been restarted more than 5 times.

Action:

from kubernetes import client, config
 
def audit_pods(namespace="default"):
    config.load_kube_config()
    v1 = client.CoreV1Api()
    
    pods = v1.list_namespaced_pod(namespace)
    print(f"Auditing pods in {namespace}...")
    
    for pod in pods.items:
        for status in pod.status.container_statuses:
            if status.restart_count > 5:
                print(f"ALERT: Pod {pod.metadata.name} has {status.restart_count} restarts!")
 
# audit_pods("production")

Result:

Auditing pods in production...
ALERT: Pod payment-processor-6789fb has 14 restarts!
ALERT: Pod redis-master-0 has 8 restarts!

Summary

Real-world scripts should always include error handling.
Use psutil for system monitoring.
Use boto3 and kubernetes SDKs instead of parsing CLI output when possible.
Keep your scripts modular so they can be easily tested.