Skip to content

Instantly share code, notes, and snippets.

View boopsboops's full-sized avatar

Rupert A. Collins boopsboops

View GitHub Profile
@boopsboops
boopsboops / ancistrus.R
Last active November 24, 2022 22:04
Bristlenose ID
#!/usr/bin/env Rscript
# load libs
library("ape")
library("tidyverse")
library("magrittr")
library("ips")
library("phangorn")
library("rentrez")
library("bold")
@boopsboops
boopsboops / reflib-format.R
Created December 13, 2018 15:07
Format and subset a tabular reference library to fasta
#!/usr/bin/env Rscript
# script to subset and convert a reference library into fasta format
# load up libraries
library("tidyverse")
library("magrittr")
library("ape")
# load up the references using the `references-load.R` script
# loads objects: uk.species.table, uk.species.table.common, reflib.orig
@boopsboops
boopsboops / sppDistMatrix2.R
Last active March 13, 2016 17:38
Modified function to generate interspecies distance matrices in R
# load libs
require("spider")
#source(file="sppDistMatrix2.R")#can source the function
# load the example data
data(dolomedes)
doloDist <- dist.dna(dolomedes)
doloSpp <- substr(dimnames(dolomedes)[[1]], 1, 5)
# you have three options for dist, which are min, mean and max
@boopsboops
boopsboops / cytb.gbf
Created April 24, 2015 15:46
GenBank flatfile for cytb
LOCUS BB-002_CYTB 201 bp DNA linear VRT 24-APR-2015
DEFINITION Boops boops.
ACCESSION
VERSION
KEYWORDS .
SOURCE mitochondrion Boops boops
ORGANISM Boops boops
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Actinopterygii; Neopterygii; Teleostei; Neoteleostei;
Acanthomorphata; Eupercaria; Spariformes; Sparidae; Boops.
@boopsboops
boopsboops / sequences.fsa
Last active August 29, 2015 14:19
Sequences in GenBank fasta format
>BB-002_CYTB [organism=Boops boops] [Bio_material=BB-002] [Specimen-voucher=MNHN:1978-0632] [location=mitochondrion] [mgcode=2]
ATGGCTAGCCTTCGAAAAACGCACCCCCTATTAAAAATTGCTAATCACGCATTAGTTGATCTCCCTGCACCCTCCAATATTTCCGTCTGATGAAATTTTGGCTCCCTGCTTGGCCTCTGTCTTATTTCCCAGCTCCTTACAGGGCTATTCCTCGCCATACACTATACCTCCGATATCGCTACAGCCTTCTCTTCCGTTGCC
>BB-003_CYTB [organism=Boops boops] [Bio_material=BB-003] [Specimen-voucher=MNHN:1978-0632] [location=mitochondrion] [mgcode=2]
ATGGCTAGCCTTCGAAAAACGCACCCCCTATTAAAAATTGCTAATCACGCATTAGTTGATCTCCCTGCACCCTCCAATATTTCCGTCTGATGAAATTTTGGCTCCCTGCTTGGCCTCTGTCTTATTTCCCAGCTCCTTACAGGGCTATTCCTCGCCATACACTATACCTCCGATATCGCTACAGCCTTCTCTTCCGTTGCC
>BB-001_rRNA [organism=Boops boops] [Bio_material=BB-001] [Specimen-voucher=MNHN:1978-0632] [location=mitochondrion]
TATGGAGCTTAAGACGCCAGGGCAGCTCACGTTAAACGCCCCTAATAAAGGAATAAAACCTAGTGAATCCTGCTCTAATGTCTTTGGTTGGGGCGACCACGGGGAATCATAAAACCCCCACGTGGAATGGGAGCACCACACTCCTAAACCCAAGAGCTTCCGCTCTAATGAACAGAACTTCTGGCCATATTAGATCCGGT
>BB-003_rRNA [organism=Boops boops] [Bio_mater
@boopsboops
boopsboops / features.tbl
Last active August 29, 2015 14:19
GenBank feature table for cytb and 16S
>Feature BB-002_CYTB
1 >201 gene
gene CYTB
1 >201 CDS
product cytochrome b
codon_start 1
>Feature BB-003_CYTB
1 >201 gene
gene CYTB
1 >201 CDS
@boopsboops
boopsboops / features_cytb.tbl
Last active August 29, 2015 14:19
Feature table for cytb
>Feature BB-002_CYTB
1 >201 gene
gene CYTB
1 >201 CDS
product cytochrome b
codon_start 1
>Feature BB-003_CYTB
1 >201 gene
gene CYTB
1 >201 CDS
@boopsboops
boopsboops / sequences_cytb.fsa
Last active August 29, 2015 14:19
GenBank format fasta file for cytb
>BB-002_CYTB [organism=Boops boops] [Bio_material=BB-002] [Specimen-voucher=MNHN:1978-0632] [location=mitochondrion] [mgcode=2]
ATGGCTAGCCTTCGAAAAACGCACCCCCTATTAAAAATTGCTAATCACGCATTAGTTGATCTCCCTGCACCCTCCAATATTTCCGTCTGATGAAATTTTGGCTCCCTGCTTGGCCTCTGTCTTATTTCCCAGCTCCTTACAGGGCTATTCCTCGCCATACACTATACCTCCGATATCGCTACAGCCTTCTCTTCCGTTGCC
>BB-003_CYTB [organism=Boops boops] [Bio_material=BB-003] [Specimen-voucher=MNHN:1978-0632] [location=mitochondrion] [mgcode=2]
ATGGCTAGCCTTCGAAAAACGCACCCCCTATTAAAAATTGCTAATCACGCATTAGTTGATCTCCCTGCACCCTCCAATATTTCCGTCTGATGAAATTTTGGCTCCCTGCTTGGCCTCTGTCTTATTTCCCAGCTCCTTACAGGGCTATTCCTCGCCATACACTATACCTCCGATATCGCTACAGCCTTCTCTTCCGTTGCC
@boopsboops
boopsboops / tlb2asn.sh
Last active August 29, 2015 14:19
Running 'tlb2asn'
# Here we use -a flag to specify fasta file format type,
# the -V flag to request verification (v) and a GenBank flatfile as part of the output (b),
# and the -T flag to tell the program to generate the higher taxonomic classifications for our record.
# Use 'tbl2asn --help' for a full list of the options
# To run if in PATH
tbl2asn -t template.sbt -i sequences.fsa -f features.tbl -a s -V vb -T
# To run if local
./tbl2asn -t template.sbt -i sequences.fsa -f features.tbl -a s -V vb -T
@boopsboops
boopsboops / 16S.R
Created April 24, 2015 03:04
Table gen for 16S
reduced_table <- tab[-which(is.na(tab$nucleotides_16S)), ]
gene_name <- "rRNA"
prod_name <- "16S ribosomal RNA"
# note that we deleted the genetic code element, as this is not a coding sequence
# also note that 'append' is now set to 'TRUE' to add the data to previously written files
fasta_description <- paste0(">", paste0(reduced_table$otherCatalogNumbers, "_", gene_name), #
" ", "[organism=", reduced_table$genus, " ", reduced_table$specificEpithet, "]", " ", #
"[Bio_material=", reduced_table$otherCatalogNumbers, "]", " ", "[Specimen-voucher=", #
reduced_table$institutionCode, ":", reduced_table$catalogNumber, "]", " ", "[location=mitochondrion]")
fasta_complete <- paste(fasta_description, reduced_table$nucleotides_16S, sep="\n")# add data to fasta