Skip to content

Instantly share code, notes, and snippets.

View hyphaltip's full-sized avatar

Jason Stajich hyphaltip

View GitHub Profile
@hyphaltip
hyphaltip / .gitignore
Last active September 20, 2023 09:33
sharon_TE_tree.R
.Rproj.user
.Rhistory
.RData
.Ruserdata
@hyphaltip
hyphaltip / cyanodb_to_amptk.py
Last active August 10, 2023 04:30
CyanoDB convert to amptk input
#!/usr/bin/env python3
import os, re
import pandas as pd
import argparse
import urllib.request
import textwrap
# RDP looks like this
# AJ000684_S000004347;tax=d:Bacteria,p:"Actinobacteria",c:Actinobacteria,o:Actinobacteridae,f:Actinomycetales,g:Corynebacterineae;
#'Domain': 'd',
CLUSTAL W (1.81) multiple sequence alignment
Afu4g00530/1-3150 MPFMSSQKMCLEANVYLVHHVFLPPKLPQEDDYDPEYELVLLEKCIEALEQFKGYVSGPE
AFUB_101080/1-3086 MPFMSSQKMCLEANVYLVHHVFLPPKLPQEDDYDPEYELVLLEKCIEALEQFKGYVSGPE
************************************************************
Afu4g00530/1-3150 ADSIAAAALMITRLAQIFGPHGDVDEKKFRNALAQLYTEGGILPVYVKCQNAAVLMTRDD
AFUB_101080/1-3086 ADSIAAAALMITRLAQIFGPHGDVDEKKFRNALAQLYTEGGILPVYVKCQNAAVLMTRDD
@hyphaltip
hyphaltip / find_orthogroup_uniquegenes.R
Created January 23, 2023 20:12
species-specific orthogroup selection
library(tidyverse)
library(dplyr)
og = read_tsv("Orthogroups/Orthogroups.tsv",col_names=TRUE) %>% rename(CF165 = Fusarium_oxysporum_CF165.proteins,
CF159 = Fusarium_oxysporum_CF159.proteins,
CF132 = Fusarium_oxysporum_CF132.proteins)
og_counts = read_tsv("Orthogroups/Orthogroups.GeneCount.tsv",col_names=TRUE)
unassigned = read_tsv("Orthogroups/Orthogroups_UnassignedGenes.tsv",col_names=TRUE) %>%
rename(CF165 = Fusarium_oxysporum_CF165.proteins,
CF159 = Fusarium_oxysporum_CF159.proteins,
@hyphaltip
hyphaltip / README.md
Last active September 3, 2020 19:56
Hyper geometric domain enrich

Domain set enrichment. code by Diego Martinez used in this paper https://mbio.asm.org/content/3/5/e00259-12

Comparative Genome Analysis of Trichophyton rubrum and Related Dermatophytes Reveals Candidate Genes Involved in Infection Diego A. Martinez, Brian G. Oliver, Yvonne Gräser, Jonathan M. Goldberg, Wenjun Li, Nilce M. Martinez-Rossi, Michel Monod, Ekaterina Shelest, Richard C. Barton, Elizabeth Birch, Axel A. Brakhage, Zehua Chen, Sarah J. Gurr, David Heiman, Joseph Heitman, Idit Kosti, Antonio Rossi, Sakina Saif, Marketa Samalova, Charles W. Saunders, Terrance Shea, Richard C. Summerbell, Jun Xu, Sarah Young, Qiandong Zeng, Bruce W. Birren, Christina A. Cuomo, Theodore C. White mBio Sep 2012, 3 (5) e00259-12; DOI: 10.1128/mBio.00259-12

Diego's instructions Lump multiple species together when comparing a set - I just added them as if they were a psuedo genome.

I think this is commented for use but let me know if it doesnt make sense.

@hyphaltip
hyphaltip / ITSx_cross-ref.py
Last active July 26, 2020 22:38
ITSx_cross-ref
#!/usr/bin/env python
import os, csv, argparse, re
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO
def get_acc(identifier):
""""Given this identifier string, return the unique part lookup for this.
This file has been truncated, but you can view the full file.
#ID Avg_fold Length Ref_GC Covered_percent Covered_bases Plus_reads Minus_reads Read_GC Median_fold Std_Dev
scaffold_1 415.2697 13820 0.5662 100.0000 13820 19003 19092 0.5794 142 1537.73
scaffold_2 188.4877 13022 0.5528 100.0000 13022 7902 7959 0.5390 148 165.47
scaffold_3 150.3550 12356 0.5389 100.0000 12356 6185 6153 0.5333 146 33.78
scaffold_4 11.8616 12161 0.5135 100.0000 12161 482 478 0.5212 10 10.11
scaffold_5 210.9362 11631 0.5654 100.0000 11631 8220 7784 0.5571 138 226.74
scaffold_6 110.3282 11581 0.6050 100.0000 11581 4291 4184 0.5991 113 30.05
scaffold_7 129.1569 11260 0.5620 100.0000 11260 4839 4809 0.5567 131 27.30
scaffold_8 124.0985 10464 0.5804 100.0000 10464 4300 4286 0.5709 126 29.72
@hyphaltip
hyphaltip / download_PRJEB4350.sh
Created April 11, 2020 21:29
Download SRA for microbiome class
curl -o PRJEB4350.txt "https://www.ebi.ac.uk/ena/data/warehouse/filereport?accession=PRJEB4350&result=read_run&fields=study_accession,sample_accession,fastq_ftp&download=txt"
while read -a PROJ SAMPLE URL
do
curl -O $URL
done
@hyphaltip
hyphaltip / Rice_ranges.R
Last active March 26, 2020 15:03
Use Genomic Ranges with Rice data
library(GenomicFeatures)
if( file.exists("MSU_7.db")){
txdb = loadDb("MSU_7.db")
} else {
txdb <- makeTxDbFromGFF("/bigdata/wesslerlab/shared/Rice/GFF/MSU_r7/MSU_r7.all.gff3",
dataSource="MSU7",
organism="Oryza sativa")
saveDb(txdb,"MSU_7.db")
}
ebg <- exonsBy(txdb, by="gene")
@hyphaltip
hyphaltip / compare.R
Last active February 9, 2020 01:09
Salary MF skew
library(ggplot2)
survey <- read.csv("data-cJLyN.csv",header=T)
survey = survey[-212,] # remove the aggregate number
pdf("summary_plots.pdf")
p <- ggplot(survey,aes(x=survey$Men.s.salary,y=survey$Women.s.salary)) + geom_point() +
geom_smooth(method = "glm", , se = F,