Skip to content

Instantly share code, notes, and snippets.

View mattravenhall's full-sized avatar
🌱
🌻

Matt Ravenhall mattravenhall

🌱
🌻
View GitHub Profile
@mattravenhall
mattravenhall / uniqID.sh
Created May 14, 2018 10:58
Pull unixtime in seconds as a unique ID for file names, useful when multi-threading but nothing especially novel.
uniqID=$(date +%s)
file='file_'${uniqID}'.txt'
@mattravenhall
mattravenhall / splitRegion.sh
Last active May 14, 2018 09:46
Often you'll need to pass a standard form genomic location (chr:bpA-bpB) but actually use the components. Splitting in bash is a pain, this does that for you. Assigning chr, bpA, and bpB to those variable names.
# Here the function assumes you're passing a standard form genomic position (chr:bpA-bpB) as the first argument.
function splitRegion {
IFS=':' read -ra REGION <<< "$1"
chr="${REGION[0]}"
IFS='-' read -ra LOCATION <<< "{REGION[1]}"
bpA="${LOCATION[0]}"
bpB="${LOCATION[1]}"
}
splitRegion $1
@mattravenhall
mattravenhall / getHaploFreqs.py
Last active May 8, 2018 13:58
Count the number of reads which contain a specific motif within a set of bam files.
import os
import datetime
def printT(message, flush=True):
print(datetime.datetime.now().strftime('\033[96m[%d-%b-%Y %H:%M:%S]\033[0m '+message),flush=flush)
outFile = 'outFile'
if autoAll:
samples = os.popen("ls -m ERR*.bam | sed -e 's/, /,/g' | tr -d '\n'").read().split(',')
else:
@mattravenhall
mattravenhall / fasta2genbank.py
Created March 21, 2018 11:32
Convert a fasta to a genbank file with SeqIO, accounting for issues with alphabets
# Fasta to Genbank (dna)
filename = 'example_fasta'
from Bio import SeqIO
from Bio.Alphabet import generic_dna
seqs = list(SeqIO.parse(filename+'.fa','fasta'))
for seq in seqs:
seq.seq.alphabet = generic_dna
SeqIO.write(seqs, filename+'.gbk', 'genbank')
@mattravenhall
mattravenhall / writeFasta.py
Created February 16, 2018 16:44
Given a list of sequences and contig names, write out a fasta.
def writeFasta(titles=[], sequences=[], filename='tmp.fasta'):
if len(titles) != len(sequences):
titles = ['contig_{}'.format(i) for i in range(len(sequences))]
# Initial new file
with open(filename, 'w') as fasta:
fasta.write('')
with open(filename, 'a') as fasta:
for t, s in zip(titles,sequences):
@mattravenhall
mattravenhall / splitDF.py
Last active February 12, 2018 18:25
Split a pandas dataframe into 'splits' number subsets of equal size (final dataframe may be shorter).
# Split a dataframe into 'splits' number subsets of equal size.
def splitDF(dataframe, splits):
assert isinstance(dataframe, pd.DataFrame), "Supplied 'dataframe' must be a pandas dataframe."
assert isinstance(splits, int), "Supplied 'splits' must be an integer."
assert dataframe.shape[0] >= splits, "Supplied 'splits' must exceed or match the number of rows in 'dataframe'."
split_size = round(dataframe.shape[0] / int(splits))
outputs = []
for i in range(splits):
@mattravenhall
mattravenhall / embl2fasta.py
Last active January 9, 2018 14:07
Converts an .embl file to a .fasta
# embl to fasta
import re
import sys
if len(sys.argv) != 3:
print('Usage: embl2fasta.py <embl_input_file> <fasta_output_name>')
sys.exit()
IDset = False
inFile = sys.argv[1] # 'example.embl'
@mattravenhall
mattravenhall / grepWrapper.sh
Last active December 30, 2017 10:37
Multiple on-demand grep searches can be tedious, this short script allows user inputs to substitute grep search commands for a specific file.
input=''
if [ "$1" = '' ]; then
input='quit'
echo "Please provide file to search as 'grepWrapper.sh <fileToSearch>'"
fi
while [ "${input}" != 'quit' ]; do
echo "Please provide search sequence, or 'quit':"
read input
@mattravenhall
mattravenhall / runlog.py
Last active November 17, 2017 11:18
Output current CPU and Memory usage to a run log
import os
import time
import psutil
# Memory logger
def memlog(info=''):
if not os.path.isfile('usage.log'):
with open('usage.log', 'a') as f:
f.write('Time,CPU,Memory,Info\n')
ctime = time.time()
@mattravenhall
mattravenhall / splitFasta.sh
Created November 15, 2017 14:21
Split a fasta into per-contig files
csplit --suffix-format='%02d.fasta' --prefix='foo_' foo.fasta '/>/+0' "{*}"