Skip to content

Instantly share code, notes, and snippets.

@Tabea-K
Last active July 9, 2020 08:14
Show Gist options
  • Save Tabea-K/6ce66c6ea3478c32e7cd7821420f4bc3 to your computer and use it in GitHub Desktop.
Save Tabea-K/6ce66c6ea3478c32e7cd7821420f4bc3 to your computer and use it in GitHub Desktop.
BASH commands
author date title
Tabea Kischka
2018-06-26
BASH Cheatsheet

BASH Cheatsheet

check for file on remote

# to list files that already exist on the server
ls *sh | while read F 
 do
  echo $F | grepuuid | while read BAMID
   do
    ssh user@server "test -e \"/PATH/WITH/VAR/AND/WILDCARDS/WXS/*/${BAMID}/*.bam\""
    if [ "$?" -eq 0 ]
    then
     echo "File already exists"
    else
     echo "File does not exist"
    fi
   done
 done

rsync

rsync -avz -e "ssh -p 11111" user@server:/filepath/ /target_path/

Time stamps

# UNIX epoch
date +%s
 1554794534
date "+%Y-%m-%d"
 2019-04-09

gzip from stdin

echo "test" | gzip > test.gz

Add line numbers to a tab-sepated file

I have a file containing the header of a tsv-file, looking like this:

cat metadata/vcf_table_expected_header.txt
CHROM   POS     TYPE    ID      REF     ALT     FILTER  NORMAL.AD       NORMAL.AF       NORMAL.GT       NORMAL.REF_F2R1 NORMAL.ALT_F1R2 NORMAL.REF_F1R2 NORMAL.ALT_F2R1 TUMOR.AD        TUMOR.AF        TUMOR.GT    TUMOR.REF_F2R1  TUMOR.ALT_F1R2  TUMOR.REF_F1R2  TUMOR.ALT_F2R1

For my documentation I want line numbers, so I run:

vcf_table_expected_header.txt | tr "\t" "\n" | nl | sed "s/^ */# /"

# 1     CHROM
# 2     POS
# 3     TYPE
# 4     ID
# 5     REF
# 6     ALT
# 7     FILTER
# 8     NORMAL.AD
# 9     NORMAL.AF
# 10    NORMAL.GT
# 11    NORMAL.REF_F2R1
# 12    NORMAL.ALT_F1R2
# 13    NORMAL.REF_F1R2
# 14    NORMAL.ALT_F2R1
# 15    TUMOR.AD
# 16    TUMOR.AF
# 17    TUMOR.GT
# 18    TUMOR.REF_F2R1
# 19    TUMOR.ALT_F1R2
# 20    TUMOR.REF_F1R2
# 21    TUMOR.ALT_F2R1

Comparison of lists

Compare a list of, for example, IDs

comm -23 <(cut -f1 list1.txt | sort) <(cat list2.txt | sort)

Scripts

Invoke a script with -x to run it in debug mode bash -x my_script.sh

find command

Find files older than 60 minutes

find . -type f -mtime +60

Find files with name *foo or *bar

find . -type f \( -name "*foo" -o -name "*bar" \)

do math with variables

X=3
Y=4
Z=$((X + Y))
echo $Z

Handy stuff...

find empty files

if [ -s diff.txt ]
then
        rm -f empty.txt
        touch full.txt
else
        rm -f full.txt
        touch empty.txt
fi

Get only filename from path

P=/long/path/file.txt
FILENAME=${P##*/}

Get only path, no filename, from path with filename

P=/long/path/file.txt
PATH=${P%/*}

Redirect stderr into Nirvana

ls -unknownoption 2>/dev/null 

Skip empty lines

grep -v '^$'

Convert lower caps to upper caps

tr '[:upper:]' '[:lower:]'
tr '[:lower:]' '[:upper:]'

Count letters in a file (e.g. nucleotides)

echo -ne "atggagaggttcgcg\ncgtaggtgatgatcgg" > file.txt
grep -o '.' file.txt | sort | uniq -c
      6 a
      4 c
     14 g
      7 t

Reuse arguments from previous commands

!^  # get the first argument from previous command
!:2 # get the second argument from previous command
!$  # get last argument from previous command
!*  # get all arguments from previous command
!!  # get the entire previous command

awk commands

Split a file based on a column

awk 'BEGIN { FS = "|" };{print > $5.txt}' raw.txt

only keep coumns that match a string

CHR=chr1
POS=342455
cat FILE | awk -v CHR=$CHR '$1==CHR' | awk -v POS=$POS '$2==POS'

Count nucleotides in a fasta file (slow)

grep -v ">" seqs.fasta | head  | fold -w1 | sort | uniq -c

Sorting, making files unique

make file unique based on two columns

cat file.txt | sort -k1 -k3 | sort -u -m -k 1,1 -k3,3

makes the file unique based on columns 1 and 3!

IF condition

string comparison

FIRSTSTRING="Hey"
SECONDSTRING="Hello"
THIRDVAR=$(date)

if [ "$FIRSTSTRING" == "$SECONDSTRING" ]
 then
  echo "Strings are identical"
 elif [ "$FIRSTSTRING" == "$THIRDVAR" ]
  then
   echo "Thats unlikely to be ever printed..."
 else
  echo "Not identical"
 fi

Integer comparison

if [[ "$MYVALUE" -gt 100 ]]
 then
  echo $MYVALUE is greater than 100
 fi

Uncompress files

tar -xvjf fo.tar.bz2
unzip gatk-4.0.11.0.zip

Redirect stdout/stderr

command 2>&1 | tee outfile
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment