awk is a tiny language for scanning and transforming text, perfect for logs, CSV/TSV, and ad-hoc analytics. Think of it as a streaming spreadsheet: you filter rows, pick/reorder columns, compute aggregates, and print results—without leaving the shell.
awk 'pattern { action }' file $1, $2, …, whole line: $0 Built-ins:
FS (input delimiter), OFS (output delimiter)RS (record sep), ORS (output record sep)NR (global line #), FNR (file-local line #), NF (#fields)FILENAME, ARGC, ARGV BEGIN { … } (before input), END { … } (after all input)-F, (set FS), -v k=v (pass vars), -f prog.awk (use a file)# CSV: print 1st & 3rd columns as TSV
awk -F, '{print $1, $3}' OFS='\t' data.csv
awk -F, 'NR>1{n++; sum+=$5} END{print "count="n,"sum="sum,"avg="sum/n}' data.csv
awk '$9 >= 500' access.log
awk -F, 'NR>1{sum[$1]+=$3} END{for (u in sum) print u,sum[u]}' OFS=, tx.csv
# Keep only the first file's header; print all data rows
awk 'FNR==1 && NR!=1{next} {print}' *.csv > merged.csv
awk '{gsub(/WARN/, "WARNING"); print}' app.log
awk -F, 'NF>=5 && $3 ~ /^[0-9.]+$/' data.csv
# name = cols 1–10, age = 12–14 (trim trailing spaces)
awk '{name=substr($0,1,10); gsub(/ +$/,"",name); age=substr($0,12,3); print name,age}' OFS=, fixed.txt
awk '!seen[$0]++' input.txt
awk -F, 'NR>1{printf "%-20s %8.2f\n", $1, $3}' data.csv
CSV is tricky (quotes, commas inside quotes). GNU awk (gawk) supports token-level parsing with FPAT:
gawk -v FPAT='([^,]*)|("[^"]*")' '
NR==1{print "user,total"; next}
{ amt = $3; gsub(/"/,"",$1); sum[$1]+=amt }
END{for (u in sum) print u "," sum[u]}
' data.csv
For fully robust CSV/JSON, consider specialized tools (
mlr,xsv,jq). Useawkwhen the format is predictable.
# Input: 2025-08-20T09:15:32 → hour bucket
gawk -F'[T:]' '{hour=$2; cnt[hour]++} END{for(h in cnt) printf "%02d,%d\n",h,cnt[h]}' events.txt | sort -t, -k1,1n
# Parse epoch & print human time
gawk '{print strftime("%Y-%m-%d %H:%M:%S", $1)}' epochs.txt
awk over multiple pipes. It can filter and format in a single pass.Pre-set locale for speed on huge data:
LC_ALL=C awk '…' bigfile
mawk is very fast but lacks some gawk features. busybox awk is minimal. For FPAT, asort()/asorti(), in-place editing, prefer gawk.# Replace and write back (gawk extension)
gawk -i inplace '{gsub(/DEBUG/, "INFO"); print}' app.log
(Portable alternative: write to temp file, then mv.)
Top N users by occurrences
gawk '{c[$1]++} END{for(u in c) print c[u],u}' file | sort -nr | head
Join two files by line number (2-column report)
paste ids.txt amounts.txt | awk -F'\t' '{print $1 "," $2}'
Unique rows by key (first field)
awk -F, '!seen[$1]++' data.csv
Histogram of HTTP codes (9th field)
awk '{h[$9]++} END{for(k in h) print k,h[k]}' access.log | sort -k1,1n
awk mlr/jq/a proper parser).RS/ORS.Bottom line: If you can say it in a sentence (“sum column 3 by user, skip header”), you can usually write it in one awk.