Skip to content

Instantly share code, notes, and snippets.

View ivan-krukov's full-sized avatar
🗃️
Getting there

Ivan Krukov ivan-krukov

🗃️
Getting there
View GitHub Profile
@ivan-krukov
ivan-krukov / blosum62.txt
Last active December 10, 2015 04:48
A little utility to read scoring matrices, like blosum62. Creates a dict of dicts for the scoring matrix, uses two implementations of the head-tail pattern.
# blosum62
# * column uses minimum score
# BLOSUM Clustered Scoring Matrix in 1/2 Bit Units
# Blocks Database = /data/blocks_5.0/blocks.dat
# Cluster Percentage: >= 62
# Entropy = 0.6979, Expected = -0.5209
A R N D C Q E G H I L K M F P S T W Y V B Z X *
A 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 -2 -1 0 -4
R -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 -1 0 -1 -4
N -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 3 0 -1 -4
@ivan-krukov
ivan-krukov / gs.vim
Created January 6, 2013 14:27
Vim regex that matches a "Genus species" name on a line
^\<\u\l\{-}\> \<\l\{-}\>$
@ivan-krukov
ivan-krukov / mul_print.sh
Created January 6, 2013 18:08
Printing multiple tab delimited files while skipping a few lines in the beginning
tail -n+4 -q results/* | less -S
@ivan-krukov
ivan-krukov / descriptive_join.sh
Created January 14, 2013 17:54
This is a nice join snippet that you can use for a descriptive join that will include NAs in the proper columns for each file. Input: A and B, with two columns (tabs) that give an attribute to a name (eg object count) <obj_1> <count_1> <obj_2> <count_2>
join -a 1 -a2 -e 'NA' -o '0,1.2,2.2' ce hc > join.count
@ivan-krukov
ivan-krukov / corr.pl
Created May 15, 2013 21:00
Solution to Rosalind CORR problem
#!/usr/bin/env perl
use v5.14;
use warnings;
sub hamming {
my @a = split "", shift;
my @b = split "", shift;
my $distance = 0;
for (my $n = 0; $n < scalar(@a); $n++) {
unless (($a[$n]) eq ($b[$n])) {
@ivan-krukov
ivan-krukov / cons.py
Last active December 17, 2015 15:19
fancy consensus builder
#!/usr/bin/env python
#http://rosalind.info/problems/cons/
import fastaparse
letters = ["A","C","G","T"]
sequences = [i.seq for i in fastaparse.sequences("temp2")]
#create empty profile matrix
@ivan-krukov
ivan-krukov / time-update.sh
Created May 28, 2013 22:05
time resync and hwclock update (arch)
sudo ntpd -gq
sudo hwclock --systohc --utc
@ivan-krukov
ivan-krukov / fastaparse.py
Created May 29, 2013 21:04
Different ways of parsing fasta files. Last one is my favorite - generators and named tuples.
#load everythin in memory
#split
def split_fasta(input_file):
with open(input_file) as fasta_file:
text = fasta_file.read().split(">")[1:]
data = []
for entry in text:
header,sequence = entry.split("\n",1)
sequence = sequence.replace("\n","")
@ivan-krukov
ivan-krukov / qw.py
Created May 31, 2013 20:23
A little perl-like qw function for interactive work
def qw(s,t=str):
return list(map(t,s.split()))
@ivan-krukov
ivan-krukov / lcsm.py
Created June 5, 2013 19:04
lcsm problem on Rosalind
#!/usr/bin/env python
import fastaparse
import sys
sequences = [i.seq for i in fastaparse.parse_fasta(sys.argv[1])]
def substrings(string):
n = len(string)
for length in range(n,0,-1):