Linux 'grep'

Preview

grep is the fastest way to filter lines that match a pattern across files, directories, or streams. This guide focuses on GNU grep with production-ready examples.

TL;DR (Cheatsheet)

Basic: grep PATTERN file
Case-insensitive: -i
Line numbers: -n
Count only: -c
Filenames only: -l (inverse -L)
Matched text only: -o
Extended regex: -E
Fixed string (no regex, faster): -F
Invert match: -v
Recursive: -r (no symlinks), -R (follow symlinks)
Context: -B N, -A N, -C N
Include/exclude globs: --include, --exclude, --exclude-dir
Color: --color=auto (force with --color=always)
Skip binaries: -I
NUL-safe piping: -Z (use with xargs -0)

Core Patterns You’ll Actually Use

# Recursive search with file filters
grep -Rni --include='*.log' --exclude-dir='.git' 'ERROR' .

# Count matches per file (quick signal for CI)
grep -Rc --include='*.py' 'TODO' src/

# Show only the matched portion (e.g., HTTP status in journal)
journalctl -u nginx | grep -oE 'status=[0-9]{3}'

# Word boundary search (avoid over-matching)
grep -rw --include='*.c' '\<init\>' src/

# Literal text (fast, ignores regex metacharacters)
grep -F '[INFO]' app.log

# Safe with odd filenames (spaces/newlines)
find . -name '*.md' -print0 | xargs -0 grep -n 'architecture'

Performance That Matters

Prefer -F for literal patterns (often much faster).
Use LC_ALL=C for byte-wise matching and speed on huge inputs:
```
LC_ALL=C grep -rF 'needle' /data/terabytes
```
Skip binaries with -I or --binary-files=without-match.
Stop early with -m N when you only need the first N hits.
In pipelines where latency matters, some tools support --line-buffered; GNU grep auto-flushes on line end, but overall pipeline buffering still applies.

Regex Primer (Just Enough)

-E (ERE): foo|bar, a{2,5}, colou?r, (ba|be)z.
-P (PCRE) for lookarounds:
```
  grep -Po '(?<=user=)[^ &]+' access.log
```
Note: -P may be missing on stock BSD/macOS grep. Use GNU grep (e.g., Homebrew ggrep) or rewrite with awk/sed.

Context & Reporting

# Show surrounding lines (2 before/after)
grep -nC2 'panic' kernel.log

# Only filenames that contain matches
grep -Rl 'TODO' src/

# Suppress filenames (multi-file, lines only)
grep -h 'pattern' file1 file2

Real-World Mini Cookbook

# 1) Summarize error bursts with context around each match
grep -Rni --include='*.log' 'CRITICAL' logs/ | sed -n 's/:.*//p' | sort -u \
 | xargs -I{} sh -c "echo '--- {} ---'; grep -nC2 'CRITICAL' {}"

# 2) Extract JSON values (simple cases)
grep -Po '"user_id"\s*:\s*"\K[^"]+' events.jsonl | sort | uniq -c | sort -nr | head

# 3) Find lines NOT matching (e.g., exclude health checks)
grep -R 'GET /' access.log | grep -v '/healthz'

Cross-Platform Notes

GNU vs BSD (macOS) differences:
- -P often unavailable on BSD grep.
- Some long options vary; always man grep.
- Install GNU grep if you need PCRE or consistent behavior.

Common Pitfalls (and Fixes)

Regex when you wanted literal → use -F.
Locale surprises / poor performance on huge inputs → LC_ALL=C.
Recursive slowness → narrow with --include/--exclude and prefer -F.
Useless use of cat → grep pattern file (pipe only when needed).