Skip to content

Instantly share code, notes, and snippets.

@delameter
Last active November 12, 2023 18:46
Show Gist options
  • Save delameter/d29c9ee4a51ba2a57379f0020ec4baef to your computer and use it in GitHub Desktop.
Save delameter/d29c9ee4a51ba2a57379f0020ec4baef to your computer and use it in GitHub Desktop.

Linux Text Processing

Cheatsheet for sed, grep and other text utils typical usecases.

1.   General

1.1.   Viewers/editors options

less nano other
Show whitespaces   Alt+P cat -A
Eval ANSI control seqs less -R    
Soft-wrap lines less -S Alt+S  
Line numbers less -N Alt+N cat -n grep -n sed =
String search / Ctrl+W  
String replace   Ctrl+\ "${var//1/2}" sed s

1.2.   Delimiters

 
Delimiter
(in)
Default delimiter
as a regexp
Escaped character
setup [1]
Example

cut -d \t $'\n'
cut -d$'\n'
sort -t \S\s [2] $'\n'
sort -t$'\t'
column -s \s+ $'\n'
column -t -s$'\t'
xargs -d \s+ '\n' '\x0a'
xargs -d'\n'
grep n/a $'\t'
grep -Ee $'\t'
tr n/a
'\n' '\t',
but $'\x1b'
tr '\t' ' '
[1]results may vary depending on an implementation, e.g. '\x1b' works for sed, but not for grep; whereas '\e' is recognized by grep and is ignored by sed. $'\x1b' or $'\e' will work anywhere because they are handled by shell (considering bash is in use).
[2]uniq default delimiter is same as for sort except it cannot be changed (by an option, that is).

1.3.   Units

bytes chars lines words (fields)
cut -b -c -f
wc -c -m -n -w
head tail -c -n
uniq -s -w -f

1.4.   Minimize buffering

sed --unbuffered
grep --line-buffered

1.5.   Color control

ls --color[=always|never|auto]
grep --color[=always|never|auto]
git diff --color[=always|never|auto]

1.6.   less runtime

less options can be used as arguments as well as literally be typed into active program window (e.g. - R Enter).

1.7.   Extended regexp

Both grep and sed support extended regexp which can be enabled with -E option.

2.   Removing

2.1.   Remove line(s) starting with

grep -v ^NANI
sed /^NANI/d

2.1.1.   Keep line(s) starting with

grep ^NANI

sed '/^NANI/!d'
Single quotes prevent expansion of !

2.2.   Remove first N line(s)

sed [-e] 1,<N>d
# or
tail -n +<N-1>

Examples

  1. Remove first line from the top:

    sed 1d
    
    tail -n +2
  2. Remove first 3 lines:

    sed 1,3d
    
    tail -n +4

2.3.   Remove last N line(s)

head -n -<N>

Examples

  1. Remove last line:

    head -n -1
    # or
    sed \$d

    $ in address means last line.

  2. Remove last 5 lines:

    head -n -5
  3. Remove lines from 5 to the last one:

    sed 5,\$d

3.   Advanced addressing

3.1.   Substitute all matches

sed s/./_/g
g means "global"

3.2.   Substitute N-th line only

sed [-e] '<N> s/$/upd'

3.2.1.   Substitute every K-th line, starting from N (including N-th)

sed [-e] '<N>~<K> s/$/upd/'
It's a GNU extension; i.e., it will not work on macOS: [➚]

Examples

  1. Append "upd" to first line only:

    sed "1 s/$/upd/"
  2. Append "upd" to 5th line only:

    sed "5 s/$/upd/"
  3. Append "upd" to last line only:

    sed "$ s/$/upd/"
  4. Append "upd" to every 2nd line from 5th, i.e. 5, 7, 9..:

    sed '5~2 s/$/upd/'

3.3.   Substitute M-th match separately for each line

sed s/./A/<M>
The POSIX standard does not specify what should happen when you mix the g and number modifiers [➚1].

Examples

  1. Replace every 3rd character of each line with "A":

    sed s/./A/3
  2. Can be combined with line selector, e.g. the next command will replace every 2nd character 'r' of every 4th line, starting from the beginning, to 'A':

    sed 1~4s/r/A/2
  3. Replace every last match of each line (dirty hack, educational use only!):

    rev | sed 's/./A/' | rev

    I'm pretty sure there is a way to do it more delicately, but at the moment don't know how exactly.

3.4.   Substitute lines from N to N2

sed [-e] '<N>,<N2> s/./A/'

3.4.1.   Substitute lines from N to (N+K)

sed [-e] '<N>,+<K> s/./A/'

Examples

  1. Prepend lines 5, 6 and 7 with "---":

    sed 5,7s/^/---
  2. Prepend lines 5-12 with "---":

    sed 5,+7s/^/---/

3.5.   Conditional substitution

Apply expr2 to lines that match expr1:

sed "<expr1> <expr2>"

Example

  1. Replace all occurences of "black" with "white" if line starts with "color":

    sed "/^color/ s/black/white/g"

4.   Selective Output

4.1.   Print affected lines, hide unchanged

sed --quiet s/A/B/p

4.2.   Additionally print affected lines into a file

sed "s/A/B/w /dev/stdout"
# or
sed "s/A/B/w /tmp/temp"

4.3.   Print line number before each line

sed '=; s/./A/'
# or
sed -e = -e s/./A/

5.   Flow control

5.1.   Jump to L if the last substitution resulted in replace

sed ':<L> <command1>; t<L>;'

5.1.1.   Jump to L if the last substitution DID NOT result in replace

sed ':<L> <command1>; T<L>;'

5.2.   Make sed exit with code E

sed q<E>

Example

  1. Search for 'b' character in every 5th line, starting from 5; continue until found or EOF encountered; after first match replace it with 'w' and immediately stop the processor; exit code will be 3:

    sed '5~5s/b/w/; tQ; T; :Q q3;'

6.   Misc

6.1.   Remove line separators \n

tr -d '\n'
# or
sed -z 's/\n//g'

6.2.   Replace line separators \n with NUL-bytes

  1. tr '\n' '\0'
  2. sed -z 's/\n/\x00/g'
For some mysterious reason s/\n/\0/g does not work, as well as \e (should use \x1b instead).

6.3.   Remove SGRs (ANSI escape sequences for text formatting)

sed -Ee 's/\x1b\[[0-9:;]*m//g'

Example

  1. Remove SGRs as in previous code block, but also print out lines with control chars stripped, i.e., with visible ANSI sequence internals:

    sed -nEe '=;p;s/\x1b(\[[0-9;]*)m/\1]/g' -e '=;p;s/\[[0-9;]*\]//g' -e '=;p'

6.4.   Fast search of text files / filter binary files

find . -type f -exec grep -Iq . {} \; -print
The -I option to grep tells him to ignore binary files, and the "." option along with -q will make him match text files [...] [➚2].

6.5.   [ -n ... ] and [ -z ... ] aspects

Sometimes it's necessary to use [ ... ] form of condition check command, e.g. when bash is unavailable, but there is pure sh (often encountered situation when you work with Docker).

There is (at least) one subtle aspect regarding -n and -z modes:

$ VAR=
$ if [ -n $VAR ] ; then echo true/$?; else echo false/$? ; fi
true/0
$ if [ -z $VAR ] ; then echo true/$?; else echo false/$? ; fi
true/0
↓ cmd result/exit code ↘ -n $VAR -n "$VAR" -z $VAR -z "$VAR"
sh -c '[ ... ]' true/0 false/1 true/0 true/0
bash -c '[[ ... ]]' false/1 false/1 true/0 true/0

The result for unquoted -n $VAR can be explained as follows:

[ $@ ] form is an equivalent of test $@ (roughly speaking), and test's behaviour depends on argument number. In this particular case VAR is defined, but empty, so that unquoted form is substituted into nothing; while quoted form becomes "", and shell treats it as extra argument, as it should.

6.6.   Exit-code-dependant conditions

https://user-images.githubusercontent.com/50381946/230963392-a1780d14-0a9d-4f58-a0db-1335701325cb.png

This happens because the negation ! operator is actually a command and the value of $? is getting overwritten after ! call with its own exit code, which is 0. However, there is one unclear aspect (where the hell is it stored in between the ! invocations?):

fn() { return 3; }
! fn ; echo $?
0
! ! fn ; echo $?
3

UPD. it is stored in $PIPESTATUS. Apparently ! invocations are made using the same piping mechanisms.


Author:Alexandr Shavykin
Contact:0.delameter@gmail.com
Date:24-Jul-24 08:45:11 PDT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment