Skip to content

Instantly share code, notes, and snippets.

View jasonsahl's full-sized avatar

Jason Sahl jasonsahl

  • Northern Arizona University
View GitHub Profile
#!/usr/bin/env python
"""filter a NASP formatted SNP
matrix, to only include a list of genomes.
The collections module requires Python 2.7+.
This script has not been tested with Python 3"""
from optparse import OptionParser
from collections import deque
import sys
@jasonsahl
jasonsahl / SNP_HP_density.py
Last active December 23, 2015 18:38
This script tries to identify portions of a reference genome that have been recombined. The required input is a NASP formatted SNP matrix and the parsimony log from Paup.
#!/usr/bin/env python
from __future__ import division
"""calculates the SNP and homoplash density
using a NASP formatted SNP matrix"""
import optparse
import sys
import collections
@jasonsahl
jasonsahl / extract_PI_SNPs.py
Last active June 11, 2019 00:24
count and extract parsimony informative SNPs from a multi-fasta
#!/usr/bin/env python
"""retrieve only parsimony infomative
sites from a nucleotide multiple sequence alignment"""
from optparse import OptionParser
import sys
try:
from Bio import SeqIO
except:
@jasonsahl
jasonsahl / find_outliers.r
Last active April 27, 2016 23:28
plot outliers in X and Y data
#The input is two colums: x and corresponding y values
require(MASS) ## for mvrnorm()
set.seed(1)
mine <- read.table("xy.txt")
mine <- data.frame(mine)
names(mine) <- c("X","Y")
plot(mine)
res <- resid(mod <- lm(Y ~ X, data = mine))
res.qt <- quantile(res, probs = c(0.001,0.999))
want <- which(res >= res.qt[1] & res <= res.qt[2])
Chromosome start coord end coord #Parsimony-informative SNPs #Homoplasious SNPs Homoplasy density ratio
NC_011595 1 1001 3 3 1
NC_011595 1001 2001 2 2 1
NC_011595 2001 3001 5 5 1
NC_011595 3001 4001 2 1 0.5
NC_011595 4001 5001 0 0 0
NC_011595 5001 6001 0 0 0
NC_011595 6001 7001 0 0 0
NC_011595 7001 8001 0 0 0
NC_011595 8001 9001 1 1 1
This file has been truncated, but you can view the full file.
>Burkholderia_ambifaria_2
ATGCTCATGCGGCGGCTTGAGGTGACGTTCCCTTCTGACGGGGACGATTGTGCAGCTTGGCTATACCTTCCGGACACCAGCAGGCCGGCACCGGTGATCGTGATGGCACATGGTCTGGGCGGCACGCGTGAAATGCGACTGGATGCGTTTGCTCACAGATTCTGCGAGGCTGGATTTGCCTGTCTGGTGTTTGATTATCGGCACTTCGGCAGCAGTGGCGGCGAGCCGCGGCAGTTGCTCGATGTAGGCAAGCAGCTACAAGACTGGAGGGCCGCGATAGCATTTGCTCGAACACGAACCGACGTAGACGCAGAGAGATTGATTGTCTGGGGATCGTCGTTTGGGGGAGGGCATGCGCTGACCATCGCGGCCGACAACGCTCACGTGTCCGCGGTCATTGCCCAGTGTCCGTTCACGGATGGGCTGGCTTCCGTTTGCGCTTTACCATTAGGATCGCTAATCAAGGTAACTGCCAGAGCGATCCGCGATCAATTCCGCGCATGGTTGGGAGGGCACCCGGTGACCATCCCGATAGCCGGGAAGCCAGGGGGGGTTGCATTAATGGTGGCTCCTGATGCCGAGCCCGGCTACATGAAGTTGGTGCCGAACGATATGTCAGCCGTCTTTCGTAACTACGTGGCCGCCCGGTTCGCTCTTCAAATTATTCGCTATTTTCCCGGTCGCAAGACTTCACGGATCGCCTGCCCGGTGCTGTTCTGTGTTTGTGATCCTGATACCGTCGCGCCGACGCGTACTACGTTGCGTCACGCAAAACGTGCACCCAAAGGATTGGTGAATATATATCCGTTCGGACATTTCGATATTTATGTCGGTTATGCGTTTGAGCGGGCAGTCAGCGATCAAATCACCTTCCTTCAAAGATTCATTGATTAA
>Burkholderia_ambifaria_3
ATGAAATTCGACAACGTCCTGCAGACTATTGGCAATACCCCGATCATCCGCATGAATCGCCTGTTTGGCGCAGACGC
@jasonsahl
jasonsahl / sum_seq_length.py
Created August 11, 2016 17:17
Calculates all bases in a multi-FASTA file
#!/usr/bin/python
#parses sequence lengths from a file and prints them to the screen
#usage python seqlength.py infasta
from __future__ import print_function
from sys import argv
import sys
try:
from Bio import SeqIO
except:
print("script requires BioPython to run..exiting")
@jasonsahl
jasonsahl / transform_kallisto_bacseq.py
Created July 17, 2017 19:37
Transfrom content of Kallisto matrix, used for in silico ribotyping of C. difficile in this case
#!/usr/bin/env python
"""Transform ribotype data. Input matrix
is a transposed output from bac_seq"""
from __future__ import print_function
from __future__ import division
import sys
import os
import optparse
ERR319438
ERR360775
ERR360792
ERR360848
ERR360746
ERR360782
ERR360788
ERR360789
ERR360783
ERR360770
@jasonsahl
jasonsahl / kallisto_wrapper.py
Created December 1, 2017 17:48
Kallisto read count wrapper
#!/usr/bin/env python
"""Read counts across a set of reference sequences.
Requires Python 2.7 to run"""
from __future__ import division
from __future__ import print_function
from optparse import OptionParser
import sys
import os