Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

View cyklee's full-sized avatar

Kevin Lee cyklee

View GitHub Profile
@cyklee
cyklee / count_fastq.sh
Last active March 11, 2019 21:24
Count the total number of reads in a collection of fastq files
for i in *.fastq; do
echo $(cat $i| wc -l)/4 | bc >> count.txt # can use zcat for gzipped files
done
calc `paste -sd+ count.txt`
@cyklee
cyklee / find.replace.sh
Created November 25, 2018 01:40
sed and shell script to find pattern from a column and replacing it with entries from another column - handy for tidying up a newick phylogenetic tree
paste Short.name.txt Proper.name.txt | while read n k; do sed -i "s/$n/$k/g" tree.nwk; done
@cyklee
cyklee / sequencial.grep.sh
Created September 29, 2018 23:17
Sequential grep that preserves the order of the list of patterns provided
# https://stackoverflow.com/questions/21908809/grep-f-file-to-print-in-order-as-a-file
# You can pipe patt.grep (pattern) to xargs, which will pass the patterns to grep one at a time.
# By default xargs appends arguments at the end of the command. But in this case, grep needs myfile.log to be the last argument. So use the -I{} option to tell xargs to replace {} with the arguments.
# foobar here is the placeholder
# This is useful as grep
cat patt.grep | xargs -Ifoobar grep foobar myfile.log
@cyklee
cyklee / count.fastq.nucleotides
Created September 29, 2018 23:13
Count number of nucleotides in all the FASTQ files in the directory
# If, for some crazy reason, you want to know how many individual bases you have in your dataset.
for i in *.fastq; do
cat $i | paste - - - - | cut -f 2 | tr -d '\n' | wc -c >> char.txt
done
R -e 'sum(read.csv("char.txt"))'
# You can replace *.fastq with *.fastq.gz and cat with zcat if your data is compressed
@cyklee
cyklee / extract_fasta_awk.sh
Created July 28, 2018 00:12
awk one-liner to extract an entry from multifasta file with linebreaks
awk -v seq="header_id" -v RS='>' '$1 == seq {print RS $0}' file
@cyklee
cyklee / prokkagff2gtf.sh
Created April 12, 2018 05:35
Converts GFF file from Prokka to GTF for htseq-count
#!/bin/bash
infile=$1
if [ "$infile" == "" ] ; then
echo "Usage: prokkagff2gtf.sh <PROKKA gff file>"
exit 0
fi
grep -v "#" $infile | grep "ID=" | cut -f1 -d ';' | sed 's/ID=//g' | cut -f1,4,5,7,9 | awk -v OFS='\t' '{print $1,"PROKKA","CDS",$2,$3,".",$4,".","gene_id " $5}'
@cyklee
cyklee / import_biom2.R
Created February 22, 2018 01:47 — forked from jnpaulson/import_biom2.R
This will convert a biom class object into a phyloseq object.
import_biom2 <- function(x,
treefilename=NULL, refseqfilename=NULL, refseqFunction=readDNAStringSet, refseqArgs=NULL,
parseFunction=parse_taxonomy_default, parallel=FALSE, version=1.0, ...){
# initialize the argument-list for phyloseq. Start empty.
argumentlist <- list()
x = read_biom(x)
b_data = biom_data(x)
b_data_mat = as(b_data, "matrix")
@cyklee
cyklee / port_forwarding.sh
Created May 26, 2017 03:22
Example of port forwarding using miniUPNP client
# Port forwarding using miniUPNP client
# Use crontab to automate upon reboot
# https://superuser.com/questions/634628/is-there-a-script-to-add-port-forwarding-rule-in-home-router
upnpc -a `ifconfig wlan0 | grep "inet addr" | cut -d : -f 2 | cut -d " " -f 1` 22 22 TCP
upnpc -a `ifconfig wlan0 | grep "inet addr" | cut -d : -f 2 | cut -d " " -f 1` 5900 5900 TCP
upnpc -a `ifconfig wlan0 | grep "inet addr" | cut -d : -f 2 | cut -d " " -f 1` 9091 9091 TCP
@cyklee
cyklee / Subset_FASTA.sh
Last active May 2, 2017 05:17
To extract a subset of reads from a muti-FASTA file using a list of header names
# https://www.biostars.org/p/49820/
# https://github.com/mdshw5/pyfaidx can be used as a drop-in replacement
xargs samtools faidx test.fa < names.txt
@cyklee
cyklee / manjaro_vboxsf.sh
Created March 30, 2017 22:06
Enable shared folders in /media in Manjaro Linux
su
systemctl enable vboxservice
systemctl start vboxservice
groupadd vboxsf
gpasswd -a $USER vboxsf
exit
sudo usermod -aG vboxsf $(whoami)
# Log off & log back in