Skip to content

Instantly share code, notes, and snippets.

View philippbayer's full-sized avatar

Philipp Bayer philippbayer

View GitHub Profile
@philippbayer
philippbayer / stacks.md
Last active February 22, 2020 13:07
Adding new restriction enzymes to Stacks

I had a ddRAD project with two restriction enzymes not supported by Stacks - Hinfl and HpyCH4IV. This is how I added them to Stacks:

  1. Download the code, open src/renz.cc
  2. Check NEB for the restriction enzyme sequences: https://www.nebiolabs.com.au/products/r0155-hinfi#Product%20Information and https://www.nebiolabs.com.au/products/r0619-hpych4iv#Product%20Information
  3. Understand how Stacks encode restriction enzymes - Hinfl is G CUT ANTC, which in the comments of renz.cc is encoded as G/ANTC
  4. Choose the longer piece (in this case, ANTC) and translate that into all four possibilities AGTC AATC ATTC ACTC, and the reverse complement GACT GATT GAAT GAGT. For others like A/CGT it's easier, take the longer piece and its reverse-complement, so it's just CGT and ACG
  5. Add that to the code, and don't forget your semicolons (I think the first letter of the RE name has to be lower case??)
@philippbayer
philippbayer / deepvariant.sh
Created February 21, 2020 15:56
GPU-enabled DeepVariant via Singularity, not Docker
module load singularity
# we have to transform the docker image into a singularity thingy. Be careful to use the GPU docker image 0.9.0-gpu, not 0.9.0
# the following command doesn’t work on copyQ?? I get `mksquashfs not found`, works fine on other systems
singularity pull $MYGROUP/singularity/myRepository/deepvariant.sif docker://google/deepvariant:0.9.0-gpu
# you may have to make the above folders
Then, using the example data described here: https://github.com/google/deepvariant/blob/r0.9/docs/deepvariant-quick-start.md
@philippbayer
philippbayer / allParser.py
Created January 17, 2012 13:34
openSNP-parser for all files in Python 3
#!/usr/bin/python3
# This Script parses a couple of genotyping-files downloaded from openSNP.
# It supports deCODEme, FamilyTreeDNA and 23andme-files.
# If you don't like the Object Oriented Magic just comment out the lines with allSnps.append
# and do your own magic.
# Example-Useage is name of the script followed by as many filenames as you like.
# ./megaparser.py 200.23andme.12 92.decodeme.1 [...]
# or:
@philippbayer
philippbayer / trimSequences.py
Created December 17, 2019 12:55
trims fastas of several sequences/genes using a single known sequence/gene
import subprocess
from Bio.Seq import Seq
from Bio import SeqIO
trimal = '/path/to/trimal'
muscle = '/path/to/muscle3.8.31_i86linux64'
# this is the fasta containing the single reference gene we use for trimming all of our seqeunces
ref_gene = '/path/to/single/ref.fasta'
all_out = open('file_to_fix.Trimmed.fasta', 'w')
@philippbayer
philippbayer / plotTheBirds.R
Created October 24, 2019 03:07
Plots a ggridges plot of fruitstatus
library(tidyverse)
library(lubridate)
library(viridis)
library(ggridges)
# fix typos
a <- read_csv('./biiiiirdsssss.csv')
a$date <- gsub("16/01/017", "16/01/2017", a$date)
a$date <- gsub("26/07/2026", "26/07/2016", a$date)
a$date <- dmy(a$date)
@philippbayer
philippbayer / natural_sort.go
Last active June 30, 2019 18:00
A perverted kind of natural sort in golang
package main
import (
"fmt"
"log"
"sort"
"strconv"
"strings"
"unicode"
)
@philippbayer
philippbayer / makeSummaryTable.py
Last active April 2, 2019 08:53
Two scripts to plot BLINK results using rMVP
# assuming that all results tables of BLINK are in the current folder, and assuming that all results tables were gzipped
# usage: python makeSummaryTable.py > Summary_Table.tsv
# this script will make one large summary table in the format rMVP requires, SNP, chrom, position, and then x columns for x phenotypes - each cell one p-value
# use grep -v to kick out regions of the genome you don't want to plot (unplaced contigs etc.)
import glob
import gzip
from collections import OrderedDict
all_phenos = ['SNP','chr','pos']
library(tidyverse)
library(lubridate) # for the days() function
library(fuzzyjoin) # for the fuzzy_left_join() function
climate <- read_tsv('./timetable.csv')
# date temperature
#1 2018-11-21 100
#2 2018-11-22 80
@philippbayer
philippbayer / annotateClasses.py
Created January 3, 2019 02:50
A stab at classifying NLR-Annotator output
'''
This will print something like
File Subset TN TCNL NL CN N TCN TNL CNL
assembly.fa_chopped_out.xml_txt complete 0 0 0 0 0 0 29 17
assembly.fa_chopped_out.xml_txt complete (pseudogene) 0 0 0 0 0 0 16 7
assembly.fa_chopped_out.xml_txt partial 3 0 1 0 4 1 0 0
assembly.fa_chopped_out.xml_txt partial (pseudogene) 1 0 0 5 1 2 2 2
@philippbayer
philippbayer / My_data.tsv
Last active January 2, 2019 10:14
This is the data and the plot I used for my PAG poster on repeatmasking problems in R-gene prediction.
Class Repeatmasking Assembly Count
CNL before B. napus 23
CNL after B. napus 13
CNL before B. rapa 20
CNL after B. rapa 10
CN before B. napus 47
CN after B. napus 25
CN before B. rapa 16
CN after B. rapa 5
N before B. napus 92