Skip to content

Instantly share code, notes, and snippets.

@knbknb
Last active August 12, 2022 10:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save knbknb/b925c4dab339f497bebfa845c7b56fb4 to your computer and use it in GitHub Desktop.
Save knbknb/b925c4dab339f497bebfa845c7b56fb4 to your computer and use it in GitHub Desktop.
bash-fragments. Notes on EdX course "Unix Tools: Data, Software and Production Engineering" by D. Spinellis

Bash COurse on edX

Shell Command Language

Which option of the uniq command allows you to specify the number of fields to ignore in its comparisons?

uniq -f

  -f, --skip-fields=N   avoid comparing the first N fields
      --group[=METHOD]  show all items, separating groups with an empty line;
                          METHOD={separate(default),prepend,append,both}

less command

less -b # scroll back

less can be easily configured to behave likemore. command to do that:export LESS=-XEmR.

The main advantage of 'less', of course, is that it allows forward and backward scrolling and, as mentioned, its performance witth large files as it doesn't read in the entire file before operations.

There is also 'most', which can display multiple files and has left/right scrolling as well.

Calculator

bc
sqrt(3)
1
scale = 5  # set number of digits/decimal precision
sqrt(3)
1.73205

Save a calculation

echo 2 ^ 64 - 1 > chess-rice
bc < chess--rice

Wildcard expansion

myfile.[^c] # do match any suffix apart from .c

[a-z]? is is consider two characters. It is not like a regular expression which [a-z]? stands for 0 or 1 character

HERE Documents

(the ability of the shell to specify the standard input for a command after its invocation. This standard input forms what is called a "Here document")

Preceded with backslash: variables and commands NOT expanded

cat <<\EOF
bla
bla
bl
EOF

Command grouping (Group commands)

; -- use simple semicolon for one-liner and use curly braces{ and}

{ echo -n 'Today is '; date; } # oneliner format

multiline format

{
  echo -n 'Today is '
  date -R; # format according to email standards : Wed, 15 Apr 2020 09:56:39 +0200
}

{ and} are reserved words not metacharacters. See Compound commands in fileman-bash.pdf

{ ls /tmp/x && rm /tmp/x ; } # The old file /tmp/x will be deleted, if it exists. correct { ls /tmp/x || touch /tmp/x ; } # A new file named /tmp/x will always be created. correct

Timezone Calculator

TZ=US/Pacific date

Scripting Tricks

if test sourcefile -nt testfile -a -r testfile # -nt is the "newer than" file-test operator, -a the and condition

Makefind andxargs work together

find ... -print0 | xargs -0 ... # argument separator is Null-Character

format output ofstat:stat -c '%Y %n' /tmp/myfile # returns unixtime fileame

Data processing Pipelines

git log parsing: Check out 1 GB of Unix: Commit History

git clone --mirror https://github.com/dspinellis/unix-history-repo.git
cd unix-history-repo.git

git log --pretty=format:%aD FreeBSD-release/10.0.0 |
cut -d, -f1 |
sort |
uniq -c |
sort -rn

### Pattern to generate a simple freqency count of something
    <something>
    sort |
    uniq -c |
    sort -rn

For loop: IMDB data

for f in title.{akas,basics,crew,episode,principals,ratings}.tsv.gz
do
  curl -L https\://datasets.imdbws\.com/$f
  gunzip $f
done

-s Silent --compressed

git command

To process the output ofgit blame with Unix tools the so-called porcelain format is the most appropriate.

git blame --line-porcelain mydir/myfile.txt

find*.vue files with most commits, changed most often

    find .  -type f -name "*.vue"  |  while read f ; do   echo -n "$f ";   git log --follow --oneline "$f" | wc -l; done | sort -k 2nr | more

dates of commits of all specific files:

git blame --line-porcelain myFile.txt | grep "author-time" | sort -u | cut -c 13- | xargs -i date +%Y-%m-%d -d@{}

Date command

get unixtime :date +%s inverse operation:date -d @1582286183

Tools to analyse compiled files:

-nm - list symbols from object files -dumpbin - Windows tool for same task -ldd - print shared object dependencies -strip - Discard symbols from object files.

fmt - simple optimal text formatter, expects a list of space-separated strings

dd

create bootable pen drive

curl -s -L ftp://site./some-bootable-img | sudo dd of /dev/usbdrive bs=10240

write some blocks of bytes in arbitrary size

dd if=/dev/zero of=/tmp/nullfile bs=32k count=12

awk command

general rule: awk command= "word" + "action"

  • word is a regex
  • action given in curly braces e.g.{print $1}

sed command

print specific lines of a file :sed -n1000,1005p myfile

xmlstarlet command

xmlstarlet sel -t -c //xpath myfile.xml

sort command

sort -t : country-population -k 5 -n -r # -t field separator. for awk -F:, for cut -d

sort - k 3M -2n # sort 3rd field by alphanumeric Month, then second field numeric

sort -C myfile # check if a file is sorted

data processing and reporting

differences

diff command cmp command # compare binary files

create 256 bytes of random binary data

dd if=/dev/random of=random--binary-data bs=1 count=256

hexdump random--binary-data

shuf -n 1 # shuffle input, pick first element

pygmentize # syntax highlighter and code formatter

Other hacks not mentioned in course

but still useful

pstree -p -C age - processtree annotated wih PIDs and colored by age (red/green)

journalctl _PID=7546 - show log entries that the process has created

ls -p | grep -v / | column - exclude process ids (which are directores not files) fromls /proc output

https://access.redhat.com/solutions/406773 - Interpreting /proc/meminfo and free output for Red Hat Enterprise Linux 5, 6 and 7

Network traffic

sudo iftop -i eno1 - highllevel view of IP traffic going in and out of your machine

sudo nethogs eno1 - which process is creating network traffic on your machine?

https://cockpit-project.org - new browser based admin tool

apachetop -f /var/log/apache2/access.log monitor access to web pages in real timeParallel

*parallel

# cleanup big tweets file fetched from Twitters streaming API
# (files are 1 twieet/line but sometimes its 1 fragment /line.
# thus, remove fragments without "created_at":
# in 100 MB blocks
parallel --pipepart -a stream__TrumpShutdown.json --block 100M grep created_at > /mnt/virtualbox2/data/stream__TrumpShutdown.json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment