Skip to content

Instantly share code, notes, and snippets.


Philipp Bayer philippbayer

View GitHub Profile
philippbayer /
Last active Dec 16, 2020
Running EDTA on Pawsey with Singularity

First, to download EDTA:

module load singularity
singularity pull EDTA.sif docker://

That'll make a new file called EDTA.sif containing everything in the EDTA v1.9.4 container.

Then we have a problem: Pawsey allows only 1 million files per user and running several EDTA runs for several genomes at once will hit that limit.

philippbayer / covid_vs_cash.Rmd
Created Jul 2, 2020
Plotting whether there's a correlation between I've Been Everywhere and Covid-19 cases
View covid_vs_cash.Rmd
```{r setup}
df <- readxl::read_xlsx('./Covid_vs_State.xlsx')
philippbayer /
Created Feb 21, 2020
GPU-enabled DeepVariant via Singularity, not Docker
module load singularity
# we have to transform the docker image into a singularity thingy. Be careful to use the GPU docker image 0.9.0-gpu, not 0.9.0
# the following command doesn’t work on copyQ?? I get `mksquashfs not found`, works fine on other systems
singularity pull $MYGROUP/singularity/myRepository/deepvariant.sif docker://google/deepvariant:0.9.0-gpu
# you may have to make the above folders
Then, using the example data described here:
philippbayer /
Last active Feb 22, 2020
Adding new restriction enzymes to Stacks

I had a ddRAD project with two restriction enzymes not supported by Stacks - Hinfl and HpyCH4IV. This is how I added them to Stacks:

  1. Download the code, open src/
  2. Check NEB for the restriction enzyme sequences: and
  3. Understand how Stacks encode restriction enzymes - Hinfl is G CUT ANTC, which in the comments of is encoded as G/ANTC
  4. Choose the longer piece (in this case, ANTC) and translate that into all four possibilities AGTC AATC ATTC ACTC, and the reverse complement GACT GATT GAAT GAGT. For others like A/CGT it's easier, take the longer piece and its reverse-complement, so it's just CGT and ACG
  5. Add that to the code, and don't forget your semicolons (I think the first letter of the RE name has to be lower case??)
philippbayer /
Created Dec 17, 2019
trims fastas of several sequences/genes using a single known sequence/gene
import subprocess
from Bio.Seq import Seq
from Bio import SeqIO
trimal = '/path/to/trimal'
muscle = '/path/to/muscle3.8.31_i86linux64'
# this is the fasta containing the single reference gene we use for trimming all of our seqeunces
ref_gene = '/path/to/single/ref.fasta'
all_out = open('file_to_fix.Trimmed.fasta', 'w')
philippbayer / plotTheBirds.R
Created Oct 24, 2019
Plots a ggridges plot of fruitstatus
View plotTheBirds.R
# fix typos
a <- read_csv('./biiiiirdsssss.csv')
a$date <- gsub("16/01/017", "16/01/2017", a$date)
a$date <- gsub("26/07/2026", "26/07/2016", a$date)
a$date <- dmy(a$date)
philippbayer /
Last active Apr 2, 2019
Two scripts to plot BLINK results using rMVP
# assuming that all results tables of BLINK are in the current folder, and assuming that all results tables were gzipped
# usage: python > Summary_Table.tsv
# this script will make one large summary table in the format rMVP requires, SNP, chrom, position, and then x columns for x phenotypes - each cell one p-value
# use grep -v to kick out regions of the genome you don't want to plot (unplaced contigs etc.)
import glob
import gzip
from collections import OrderedDict
all_phenos = ['SNP','chr','pos']
View date_range_merge.R
library(lubridate) # for the days() function
library(fuzzyjoin) # for the fuzzy_left_join() function
climate <- read_tsv('./timetable.csv')
# date temperature
#1 2018-11-21 100
#2 2018-11-22 80
philippbayer /
Created Jan 3, 2019
A stab at classifying NLR-Annotator output
This will print something like
assembly.fa_chopped_out.xml_txt complete 0 0 0 0 0 0 29 17
assembly.fa_chopped_out.xml_txt complete (pseudogene) 0 0 0 0 0 0 16 7
assembly.fa_chopped_out.xml_txt partial 3 0 1 0 4 1 0 0
assembly.fa_chopped_out.xml_txt partial (pseudogene) 1 0 0 5 1 2 2 2
philippbayer / My_data.tsv
Last active Jan 2, 2019
This is the data and the plot I used for my PAG poster on repeatmasking problems in R-gene prediction.
View My_data.tsv
Class Repeatmasking Assembly Count
CNL before B. napus 23
CNL after B. napus 13
CNL before B. rapa 20
CNL after B. rapa 10
CN before B. napus 47
CN after B. napus 25
CN before B. rapa 16
CN after B. rapa 5
N before B. napus 92
You can’t perform that action at this time.