Skip to content

Instantly share code, notes, and snippets.

View ivan-krukov's full-sized avatar
🗃️
Getting there

Ivan Krukov ivan-krukov

🗃️
Getting there
View GitHub Profile
@ivan-krukov
ivan-krukov / volume
Created August 2, 2012 23:18
Change the OS X sound volume from the command line
! /usr/bin/osascript
on run argv
set Volume (item 1 of argv)
end run
@ivan-krukov
ivan-krukov / kyles_script.py
Created August 9, 2012 17:19
Header remover
import argparse
from Bio import SeqIO
parser=argparse.ArgumentParser()
parser.add_argument("inputFile", help="input fasta file")
parser.add_argument("outputFile",help="output file name")
parser.add_argument("sampleName",help="sample name to be removed")
args = parser.parse_args()
@ivan-krukov
ivan-krukov / fastaparse.py
Created August 9, 2012 18:20
Another quick FASTA parser
#Read a fasta file and only keep the sequences with correct headers (id_pattern regex)
import re
import sys
seq_pattern = re.compile(r">[^>]+\n",re.MULTILINE)
id_pattern = re.compile(r"protein_id:(?P<id>[.\w]+)")
with open(sys.argv[1]) as f:
text = f.read()
@ivan-krukov
ivan-krukov / splitter.sh
Created August 17, 2012 18:28
This atrocious shell script prints the first 1/nth (half, third, etc) part of a file
#!/bin/sh
#get the command line arguments
input_file=$1
divisor=$2
#run wc on the file in argv[1]
size=`wc -l $input_file`
#split the return on whitespace - first word is now in
set $size
#get the intiger division of wc/divisor
part=$(($1/$divisor))
@ivan-krukov
ivan-krukov / oggle.sh
Created November 10, 2012 01:20
A little output logging tool
#!/bin/sh
cmd=$*
pid=$$
echo "$cmd @ `pwd`; Started at `date`" > $pid.out
echo "[$pid] $cmd"
eval $cmd >> $pid.out 2>> $pid.err
echo "$cmd @ `pwd`; Finished at `date`" >> $pid.out
@ivan-krukov
ivan-krukov / fastq_sample.py
Created November 21, 2012 20:04
Random sample of a FASTQ file
#Take a fraction of random sequence reads from a fastq file
from sh import wc
import argparse
import random
def first_word(string):
return string.strip().split()[0]
#read a file in chunks of deflines
def read_segments(filename,deflines):
@ivan-krukov
ivan-krukov / blosum62.txt
Last active December 10, 2015 04:48
A little utility to read scoring matrices, like blosum62. Creates a dict of dicts for the scoring matrix, uses two implementations of the head-tail pattern.
# blosum62
# * column uses minimum score
# BLOSUM Clustered Scoring Matrix in 1/2 Bit Units
# Blocks Database = /data/blocks_5.0/blocks.dat
# Cluster Percentage: >= 62
# Entropy = 0.6979, Expected = -0.5209
A R N D C Q E G H I L K M F P S T W Y V B Z X *
A 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 -2 -1 0 -4
R -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 -1 0 -1 -4
N -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 3 0 -1 -4
@ivan-krukov
ivan-krukov / gs.vim
Created January 6, 2013 14:27
Vim regex that matches a "Genus species" name on a line
^\<\u\l\{-}\> \<\l\{-}\>$
@ivan-krukov
ivan-krukov / mul_print.sh
Created January 6, 2013 18:08
Printing multiple tab delimited files while skipping a few lines in the beginning
tail -n+4 -q results/* | less -S
@ivan-krukov
ivan-krukov / descriptive_join.sh
Created January 14, 2013 17:54
This is a nice join snippet that you can use for a descriptive join that will include NAs in the proper columns for each file. Input: A and B, with two columns (tabs) that give an attribute to a name (eg object count) <obj_1> <count_1> <obj_2> <count_2>
join -a 1 -a2 -e 'NA' -o '0,1.2,2.2' ce hc > join.count