Skip to content

Instantly share code, notes, and snippets.

Avatar

Jason Stajich hyphaltip

View GitHub Profile
@hyphaltip
hyphaltip / README.md
Last active Sep 3, 2020
Hyper geometric domain enrich
View README.md

Domain set enrichment. code by Diego Martinez used in this paper https://mbio.asm.org/content/3/5/e00259-12

Comparative Genome Analysis of Trichophyton rubrum and Related Dermatophytes Reveals Candidate Genes Involved in Infection Diego A. Martinez, Brian G. Oliver, Yvonne Gräser, Jonathan M. Goldberg, Wenjun Li, Nilce M. Martinez-Rossi, Michel Monod, Ekaterina Shelest, Richard C. Barton, Elizabeth Birch, Axel A. Brakhage, Zehua Chen, Sarah J. Gurr, David Heiman, Joseph Heitman, Idit Kosti, Antonio Rossi, Sakina Saif, Marketa Samalova, Charles W. Saunders, Terrance Shea, Richard C. Summerbell, Jun Xu, Sarah Young, Qiandong Zeng, Bruce W. Birren, Christina A. Cuomo, Theodore C. White mBio Sep 2012, 3 (5) e00259-12; DOI: 10.1128/mBio.00259-12

Diego's instructions Lump multiple species together when comparing a set - I just added them as if they were a psuedo genome.

I think this is commented for use but let me know if it doesnt make sense.

View ITSx_cross-ref.py
#!/usr/bin/env python
import os, csv, argparse, re
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO
def get_acc(identifier):
""""Given this identifier string, return the unique part lookup for this.
View SRR420852.bbmap_covstats.txt
This file has been truncated, but you can view the full file.
#ID Avg_fold Length Ref_GC Covered_percent Covered_bases Plus_reads Minus_reads Read_GC Median_fold Std_Dev
scaffold_1 415.2697 13820 0.5662 100.0000 13820 19003 19092 0.5794 142 1537.73
scaffold_2 188.4877 13022 0.5528 100.0000 13022 7902 7959 0.5390 148 165.47
scaffold_3 150.3550 12356 0.5389 100.0000 12356 6185 6153 0.5333 146 33.78
scaffold_4 11.8616 12161 0.5135 100.0000 12161 482 478 0.5212 10 10.11
scaffold_5 210.9362 11631 0.5654 100.0000 11631 8220 7784 0.5571 138 226.74
scaffold_6 110.3282 11581 0.6050 100.0000 11581 4291 4184 0.5991 113 30.05
scaffold_7 129.1569 11260 0.5620 100.0000 11260 4839 4809 0.5567 131 27.30
scaffold_8 124.0985 10464 0.5804 100.0000 10464 4300 4286 0.5709 126 29.72
@hyphaltip
hyphaltip / download_PRJEB4350.sh
Created Apr 11, 2020
Download SRA for microbiome class
View download_PRJEB4350.sh
curl -o PRJEB4350.txt "https://www.ebi.ac.uk/ena/data/warehouse/filereport?accession=PRJEB4350&result=read_run&fields=study_accession,sample_accession,fastq_ftp&download=txt"
while read -a PROJ SAMPLE URL
do
curl -O $URL
done
@hyphaltip
hyphaltip / Rice_ranges.R
Last active Mar 26, 2020
Use Genomic Ranges with Rice data
View Rice_ranges.R
library(GenomicFeatures)
if( file.exists("MSU_7.db")){
txdb = loadDb("MSU_7.db")
} else {
txdb <- makeTxDbFromGFF("/bigdata/wesslerlab/shared/Rice/GFF/MSU_r7/MSU_r7.all.gff3",
dataSource="MSU7",
organism="Oryza sativa")
saveDb(txdb,"MSU_7.db")
}
ebg <- exonsBy(txdb, by="gene")
@hyphaltip
hyphaltip / compare.R
Last active Feb 9, 2020
Salary MF skew
View compare.R
library(ggplot2)
survey <- read.csv("data-cJLyN.csv",header=T)
survey = survey[-212,] # remove the aggregate number
pdf("summary_plots.pdf")
p <- ggplot(survey,aes(x=survey$Men.s.salary,y=survey$Women.s.salary)) + geom_point() +
geom_smooth(method = "glm", , se = F,
View Mtub.summary_mean_table.tab
GENE Glycerol_5.7 Glycerol_7 Pyruvate_5.7 Pyruvate_7 Pfam GO
MT_RS00005 106.0542 314.5750 173.0920 237.8235 AAA,Bac_DnaA,Bac_DnaA_C,IstB_IS21 GO:0005524,GO:0006270,GO:0006275,GO:0043565
MT_RS00010 128.9815 243.9095 184.8480 208.7100 DNA_pol3_beta,DNA_pol3_beta_2,DNA_pol3_beta_3 GO:0003677,GO:0003887,GO:0006260,GO:0008408,GO:0009360
MT_RS00015 32.2150 42.6219 42.1174 33.6780 AAA_23,SMC_N
MT_RS00020 77.4342 95.4774 81.4021 79.0631 DciA
MT_RS00025 326.5180 309.6865 318.9515 274.1600 DNA_gyraseB,DNA_gyraseB_C,HATPase_c,Toprim GO:0003677,GO:0003918,GO:0005524,GO:0006265
MT_RS00030 263.7685 271.5500 278.9960 244.5060 DNA_gyraseA_C,DNA_topoisoIV GO:0003677,GO:0003916,GO:0003918,GO:0005524,GO:0006265
MT_RS00035 165.9535 200.4295 213.7435 182.7295 DUF3566
MT_RS00050 0.0000 0.0000 0.0000 0.0000
MT_RS00055 108.4240 188.3470 154.3880 211.6495 CwsA
@hyphaltip
hyphaltip / Duplications.csv
Last active May 6, 2019
Duplications_plot
View Duplications.csv
We can't make this file beautiful and searchable because it's too large.
Orthogroup,Species Tree Node,Gene Tree Node,Support,,Genes 1,Genes 2
@hyphaltip
hyphaltip / data.txt
Last active Feb 22, 2019
GriffinEvo_question
View data.txt
2 zlm z2m
1 2
0 2 residual
0.1777 5.08123E-002
5.08123E-002 0.4513
1 2 line
0.8389 -6.64123E-002
-6.64123E-002 0.554
View ortho2pattern.py
#!/usr/bin/env python3
import csv
input = 'Orthogroups.csv'
outfile = 'phyletic_patterns.txt'
# open report file you will write to
patterns = dict()
with open(input) as csvfile:
# columns with gene info by species are tab delimited
reader = csv.reader(csvfile,delimiter="\t")
You can’t perform that action at this time.