Text processing is at the heart of Linux and DevOps. Whether you're parsing log files, updating configuration files, or extracting data from APIs, these tools are essential.
awk: The Field Processor
awk is a complete programming language designed for text processing. It's best used when your data is organized into columns or fields.
Basic Syntax
awk 'pattern { action }' fileExamples with Results
1. Print Specific Fields
By default, awk uses whitespace as a delimiter. $1, $2, etc., represent fields.
Command:
# Get the first and third columns of a file
echo "Alice 25 Engineer
Bob 30 Designer
Charlie 22 Developer" | awk '{ print $1, $3 }'Result:
Alice Engineer
Bob Designer
Charlie Developer2. Using a Custom Delimiter
Use -F to specify a delimiter (e.g., : for /etc/passwd).
Command:
echo "root:x:0:0:root:/root:/bin/bash" | awk -F: '{ print "User: " $1 ", Shell: " $7 }'Result:
User: root, Shell: /bin/bash3. Filtering by Condition
Command:
# Print users with UID (field 3) greater than 500
echo "user1:x:100:
user2:x:1000:
user3:x:450:" | awk -F: '$3 > 500 { print $1 }'Result:
user24. Advanced: Calculating Sums
Command:
# Calculate the total file size in the current directory
ls -l | awk '{ sum += $5 } END { print "Total size: " sum " bytes" }'Result: (Depends on your directory)
Total size: 45023 bytessed: The Stream Editor
sed is primarily used for finding and replacing text in files or streams.
Basic Syntax
sed 's/old/new/g' fileExamples with Results
1. Basic Substitution
Command:
echo "The server is DOWN" | sed 's/DOWN/UP/'Result:
The server is UP2. In-place File Editing
Use -i to save changes directly to the file.
sed -i 's/localhost/127.0.0.1/g' config.yaml3. Delete Lines
Command:
# Delete the second line
echo -e "Line 1\nLine 2\nLine 3" | sed '2d'Result:
Line 1
Line 3jq: The JSON Processor
In modern DevOps, managing JSON (from APIs, AWS CLI, Kubernetes) is mandatory. jq is the industry standard.
Examples with Results
1. Extract a Field
Command:
echo '{"name": "production", "status": "running"}' | jq '.status'Result:
"running"2. Extract from an Array
Command:
echo '[{"id": 1, "name": "web"}, {"id": 2, "name": "db"}]' | jq '.[0].name'Result:
"web"3. Filtering Arrays
Command:
echo '[{"name": "app1", "cpu": 80}, {"name": "app2", "cpu": 10}]' | jq '.[] | select(.cpu > 50) | .name'Result:
"app1"DevOps Tool Comparison
| Tool | Best Use Case |
|---|---|
| grep | Searching for patterns in lines |
| sed | Modifying text (search and replace) |
| awk | Processing columnar data and reports |
| jq | Parsing and transforming JSON data |
Practice Exercise
Try parsing the output of df -h to find filesystems used more than 80%:
df -h | awk 'NR>1 && +$5 > 80 { print $1 ": " $5 }'