Skip to content

Instantly share code, notes, and snippets.

View kescobo's full-sized avatar

Kevin Bonham kescobo

View GitHub Profile
@kescobo
kescobo / Rosalind_question.py
Created April 11, 2014 18:33
Problem from rosalind.info
input = open('rosalind_ini6.txt', 'r')
input = input.read()
words = []
list = {}
for word in input.split(' '):
words.append(word)
for i in xrange(0, len(words)):
import re
#open file to read
with open('test.fasta', 'r') as seqs:
seqFile = (open('throwaway.txt', 'a+'))
for line in seqs:
if re.search('^>[a-zA-Z][a-zA-Z][a-zA-Z][a-zA-Z]', line):
seqFile.close()
@kescobo
kescobo / pseudo_code
Last active August 29, 2015 14:17
HGT analysis pipeline
import annotated genomes
for genome in genomes:
write genome to genome_database
database{
species_name:
CDS{
annotation:
aa sequence:
nt sequence:
@kescobo
kescobo / arthrobacter_genome.gb
Last active January 11, 2016 18:24
Files and code for Biopython issue #747
LOCUS BW77_ACAGTG.R1_(paired)_contig_93 233 bp DNA linear UNK
DEFINITION Contig BW77_ACAGTG.R1_(paired)_contig_93 from Arthrobacter sp.
BW77
ACCESSION unknown
FEATURES Location/Qualifiers
source 1..233
/mol_type="genomic DNA"
/db_xref="taxon: 6666666"
/genome_md5=""
/project="bewolfe_6666666"
@kescobo
kescobo / gbk_to_faa.py
Last active January 14, 2016 21:14
Python definitions to take genbank file containing multiple genomes and dividing them into fasta protein files, one file per genome. For Stackoverflow question http://stackoverflow.com/posts/34777323/
from operator import itemgetter
from itertools import groupby
def gbk_to_faa(some_genbank):
source = None
for record in SeqIO.parse(some_genbank, 'gb'):
if source:
if record.annotations['source'] != source:
out_file.close()
source = sub(r'\W+', "_", sub(r'\W$', "", record.annotations['source']))
@kescobo
kescobo / newjd.jl
Last active March 16, 2017 16:36
Better Jaccard Distance
function jaccarddistance(sketch1::MinHashSketch, sketch2::MinHashSketch)
d = length(setdiff(sketch1.sketch, sketch2.sketch))
l = length(sketch1)
return (l-d) / (l+d)
end
function newjd(sketch1::MinHashSketch, sketch2::MinHashSketch)
matches = 0
sketchlen = length(sketch1)
i = 1
@kescobo
kescobo / newminhash.jl
Created March 18, 2017 16:14
Benchmarks for new implementation of kmerminhash
b1 = open(FASTAReader,
"/Users/ksb/computation/science/genomes/kv_input/fasta/Brachybacterium_alimentarium_738_10.fna")
seq1 = dna""
for s in b1
seq1 = seq1 * s.seq
end
length(seq1) # 4160958
_
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: https://docs.julialang.org
_ _ _| |_ __ _ | Type "?help" for help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.6.0-rc1.0 (2017-05-07 00:00 UTC)
_/ |\__'_|_|_|\__'_| |
|__/ | x86_64-apple-darwin16.5.0
julia> using Plots
@kescobo
kescobo / jb418.sam
Created June 8, 2017 20:48
Sam contents
@HD VN:1.0 SO:unsorted
@SQ SN:tig00000001 LN:4311805
@SQ SN:tig00000002 LN:66004
@SQ SN:tig00000004 LN:1036550
@SQ SN:tig00000005 LN:256226
@SQ SN:tig00000007 LN:150530
@SQ SN:tig00000008 LN:138168
@PG ID:bowtie2 PN:bowtie2 VN:2.3.2 CL:"/usr/local/bin/../Cellar/bowtie2/2.3.2/bin/bowtie2-align-s --wrapper basic-0 -x jb418 -S jb418.sam -1 ../../raw_reads/r1.fastq -2 ../../raw_reads/r2.fastq"
HWI-D00742:45:H7W5YBCXX:1:1101:1219:2200 73 tig00000001 3359004 42 100M = 3359004 0 CCTGNGTGACGAAGACCACCTGGGCGACATGGACTTCAAGGTAGCCGGTACCGCCAAAGGTGTTACCGCGCTGCAGATGGACATCAAGATCNAGGGCATC DDDD#<<EHHHIIHIIIIIIIIIHHIIIIIIIIIIIIIIIIIHEHIIIIIIIIIIIIIIIHHIIIHIIIIIIIIHIEHIHIIIIIIIIGGH#<<EHFHIH AS:i:-2 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:4G86A8 YT:Z:UP
HWI-D00742:45:H7W5YBCXX:1:1101:1219:2200 133 tig00000001 3359004 0 * = 3359004 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN ####################################################################################################
@kescobo
kescobo / metadict_benchmarks.jl
Created October 16, 2017 16:31
Benchmarks for MetaGraph indexing schemes
using LightGraphs
using MetaGraphs
using BenchmarkTools
function name_lookup(mgraph::MetaGraph, name::String)
return collect(filter_vertices(mgraph, :name, name))[1]
end
function dict_assign!(mgraph::MetaGraph, spindex::Vector{Int}, meindex::Vector{Int}, weights::Vector{Float64})
for i in 1:10000