Skip to content

Instantly share code, notes, and snippets.

#The LCS (Longest common substring) problem is to find the longest string which is a substring in two or more strings.
#Unlike subsequence, substring emphasizes on its continuity.
#Kmer refers to all the possible substrings whose length is K in a string. It is widely used in sequence assembly.
#To compare two sequences, basically is to find all the common Kmers between the two strings.
#In order to extend this method to multisequence alignment, LCS is not a very good idea because the longest substring might not be exist in the next string.
#Here use DP to list all common substring between two sequence. And then compare with other sequences
#data_set_is_on_the_bottom
from Bio import SeqIO
@manrysh
manrysh / Blast_blastp shell command muiti2multi comparison
Last active October 10, 2018 13:18
Blast_blastp shell command muiti2multi comparison
#Blast_blastp shell command muiti2multi comparison
#formatdb
formatdb -i xx -p F
#blastp
for k in ./faa; do blastall -p blastp -i xx -d $k -e 1e-3 -o xx_${k##*/}.txt;done
#.faa.txt_rename
rename -v s/\.faa.txt/\.txt/ *
#zusammen
for k in ../faa_prokka/*.faa; do m=${k##*/}; for j in ../faa_prokka/*.faa; do n=${j##/};if ["$k"!="$j"]; then blastall -p blastp li $k -d $j -e 1e-5 -o ${m%.*}_${n%.*}.txt;fi;done;done;
@manrysh
manrysh / SUM3
Last active December 6, 2016 12:39
k = 15
n = 9838
f0 = open('a.txt','r')
f = open('c.txt','w')
s = [line.strip('\n') for line in f0.readlines()]
def sum3(ss,nn):
oo = [[ss[i]+ss[j] for j in range(i)] for i in range(nn)]
for i in range(nn):
probe = 0
@manrysh
manrysh / Blast result processing
Last active June 1, 2016 10:29
Blast result processing
#Multiple alignment result of Blast
#query/alignment/identity/positive/coverage are collected
#Only for the best result
from Bio.Blast import NCBIStandalone
import os, sys
path='/.../.../...'
for i in os.listdir(path):
result_handle = open(str(i))
blast_parser = NCBIStandalone.BlastParser()