awk
is a tiny language for scanning and transforming text, perfect for logs, CSV/TSV, and ad-hoc analytics. Think of it as a streaming spreadsheet: you filter rows, pick/reorder columns, compute aggregates, and print results—without leaving the shell.
awk 'pattern { action }' file
$1, $2, …
, whole line: $0
Built-ins:
FS
(input delimiter), OFS
(output delimiter)RS
(record sep), ORS
(output record sep)NR
(global line #), FNR
(file-local line #), NF
(#fields)FILENAME
, ARGC
, ARGV
BEGIN { … }
(before input), END { … }
(after all input)-F,
(set FS), -v k=v
(pass vars), -f prog.awk
(use a file)# CSV: print 1st & 3rd columns as TSV
awk -F, '{print $1, $3}' OFS='\t' data.csv
awk -F, 'NR>1{n++; sum+=$5} END{print "count="n,"sum="sum,"avg="sum/n}' data.csv
awk '$9 >= 500' access.log
awk -F, 'NR>1{sum[$1]+=$3} END{for (u in sum) print u,sum[u]}' OFS=, tx.csv
# Keep only the first file's header; print all data rows
awk 'FNR==1 && NR!=1{next} {print}' *.csv > merged.csv
awk '{gsub(/WARN/, "WARNING"); print}' app.log
awk -F, 'NF>=5 && $3 ~ /^[0-9.]+$/' data.csv
# name = cols 1–10, age = 12–14 (trim trailing spaces)
awk '{name=substr($0,1,10); gsub(/ +$/,"",name); age=substr($0,12,3); print name,age}' OFS=, fixed.txt
awk '!seen[$0]++' input.txt
awk -F, 'NR>1{printf "%-20s %8.2f\n", $1, $3}' data.csv
CSV is tricky (quotes, commas inside quotes). GNU awk (gawk) supports token-level parsing with FPAT
:
gawk -v FPAT='([^,]*)|("[^"]*")' '
NR==1{print "user,total"; next}
{ amt = $3; gsub(/"/,"",$1); sum[$1]+=amt }
END{for (u in sum) print u "," sum[u]}
' data.csv
For fully robust CSV/JSON, consider specialized tools (
mlr
,xsv
,jq
). Useawk
when the format is predictable.
# Input: 2025-08-20T09:15:32 → hour bucket
gawk -F'[T:]' '{hour=$2; cnt[hour]++} END{for(h in cnt) printf "%02d,%d\n",h,cnt[h]}' events.txt | sort -t, -k1,1n
# Parse epoch & print human time
gawk '{print strftime("%Y-%m-%d %H:%M:%S", $1)}' epochs.txt
awk
over multiple pipes. It can filter and format in a single pass.Pre-set locale for speed on huge data:
LC_ALL=C awk '…' bigfile
mawk
is very fast but lacks some gawk features. busybox awk
is minimal. For FPAT
, asort()/asorti()
, in-place editing, prefer gawk.# Replace and write back (gawk extension)
gawk -i inplace '{gsub(/DEBUG/, "INFO"); print}' app.log
(Portable alternative: write to temp file, then mv
.)
Top N users by occurrences
gawk '{c[$1]++} END{for(u in c) print c[u],u}' file | sort -nr | head
Join two files by line number (2-column report)
paste ids.txt amounts.txt | awk -F'\t' '{print $1 "," $2}'
Unique rows by key (first field)
awk -F, '!seen[$1]++' data.csv
Histogram of HTTP codes (9th field)
awk '{h[$9]++} END{for(k in h) print k,h[k]}' access.log | sort -k1,1n
awk
mlr
/jq
/a proper parser).RS
/ORS
.Bottom line: If you can say it in a sentence (“sum column 3 by user, skip header”), you can usually write it in one awk
.