Skip to content

Instantly share code, notes, and snippets.

View boopsboops's full-sized avatar

Rupert A. Collins boopsboops

View GitHub Profile
@boopsboops
boopsboops / master_fake.tsv
Last active August 29, 2015 14:19
Example of master TSV, but with added tabs (for vis)
We can make this file beautiful and searchable if this error is corrected: It looks like row 2 should actually have 8 columns, instead of 14. in line 1.
otherCatalogNumbers genus specificEpithet institutionCode catalogNumber country nucleotides_CYTB nucleotides_16S
BB-001 Boops boops MNHN 1978-0632 Spain NA TATGGAGCTTAA
BB-002 Boops boops MNHN 1978-0632 Spain ATGGCTAGCCT NA
BB-003 Boops boops MNHN 1978-0632 Spain ATGGCTAGCCT TATGGAGCTTAA
@boopsboops
boopsboops / read_reduce.R
Created April 24, 2015 02:12
Read and reduce
tab <- read.table("master.tsv", header=TRUE, sep="\t", stringsAsFactors=FALSE)
reduced_table <- tab[-which(is.na(tab$nucleotides_CYTB)), ]
@boopsboops
boopsboops / write_fasta.R
Last active August 29, 2015 14:19
Write fasta
fasta_description <- paste0(">", paste0(reduced_table$otherCatalogNumbers, "_", gene_name), " ", #
"[organism=", reduced_table$genus, " ", reduced_table$specificEpithet, "]", " ", #
"[Bio_material=", reduced_table$otherCatalogNumbers, "]", " ", "[Specimen-voucher=", #
reduced_table$institutionCode, ":", reduced_table$catalogNumber, "]", " ", "[location=mitochondrion] [mgcode=2]")
fasta_complete <- paste(fasta_description, reduced_table$nucleotides_CYTB, sep="\n")# add data to fasta
write(fasta_complete, file="sequences.fsa", append=FALSE)# write out the fasta file
@boopsboops
boopsboops / prod_name.R
Created April 24, 2015 02:21
Product and gene names
gene_name <- "CYTB"
prod_name <- "cytochrome b"
@boopsboops
boopsboops / feature_table.R
Created April 24, 2015 02:56
Feature table
feature_tab <- paste0(paste0(">Feature", " ", reduced_table$otherCatalogNumbers, "_", gene_name),"\n", #
"1", "\t", ">", nchar(reduced_table$nucleotides_CYTB), "\t", "gene", "\n", #
"\t", "\t", "\t", "gene", "\t", gene_name, "\n", #
"1", "\t", ">", nchar(reduced_table$nucleotides_CYTB), "\t", "CDS", "\t", "\t", "\n", #
"\t", "\t", "\t", "product", "\t", prod_name, "\n", #
"\t", "\t", "\t", "codon_start", "\t", "1")
write(feature_tab, file="features.tbl", append=FALSE)# write out
@boopsboops
boopsboops / 16S.R
Created April 24, 2015 03:04
Table gen for 16S
reduced_table <- tab[-which(is.na(tab$nucleotides_16S)), ]
gene_name <- "rRNA"
prod_name <- "16S ribosomal RNA"
# note that we deleted the genetic code element, as this is not a coding sequence
# also note that 'append' is now set to 'TRUE' to add the data to previously written files
fasta_description <- paste0(">", paste0(reduced_table$otherCatalogNumbers, "_", gene_name), #
" ", "[organism=", reduced_table$genus, " ", reduced_table$specificEpithet, "]", " ", #
"[Bio_material=", reduced_table$otherCatalogNumbers, "]", " ", "[Specimen-voucher=", #
reduced_table$institutionCode, ":", reduced_table$catalogNumber, "]", " ", "[location=mitochondrion]")
fasta_complete <- paste(fasta_description, reduced_table$nucleotides_16S, sep="\n")# add data to fasta
@boopsboops
boopsboops / tlb2asn.sh
Last active August 29, 2015 14:19
Running 'tlb2asn'
# Here we use -a flag to specify fasta file format type,
# the -V flag to request verification (v) and a GenBank flatfile as part of the output (b),
# and the -T flag to tell the program to generate the higher taxonomic classifications for our record.
# Use 'tbl2asn --help' for a full list of the options
# To run if in PATH
tbl2asn -t template.sbt -i sequences.fsa -f features.tbl -a s -V vb -T
# To run if local
./tbl2asn -t template.sbt -i sequences.fsa -f features.tbl -a s -V vb -T
@boopsboops
boopsboops / sequences_cytb.fsa
Last active August 29, 2015 14:19
GenBank format fasta file for cytb
>BB-002_CYTB [organism=Boops boops] [Bio_material=BB-002] [Specimen-voucher=MNHN:1978-0632] [location=mitochondrion] [mgcode=2]
ATGGCTAGCCTTCGAAAAACGCACCCCCTATTAAAAATTGCTAATCACGCATTAGTTGATCTCCCTGCACCCTCCAATATTTCCGTCTGATGAAATTTTGGCTCCCTGCTTGGCCTCTGTCTTATTTCCCAGCTCCTTACAGGGCTATTCCTCGCCATACACTATACCTCCGATATCGCTACAGCCTTCTCTTCCGTTGCC
>BB-003_CYTB [organism=Boops boops] [Bio_material=BB-003] [Specimen-voucher=MNHN:1978-0632] [location=mitochondrion] [mgcode=2]
ATGGCTAGCCTTCGAAAAACGCACCCCCTATTAAAAATTGCTAATCACGCATTAGTTGATCTCCCTGCACCCTCCAATATTTCCGTCTGATGAAATTTTGGCTCCCTGCTTGGCCTCTGTCTTATTTCCCAGCTCCTTACAGGGCTATTCCTCGCCATACACTATACCTCCGATATCGCTACAGCCTTCTCTTCCGTTGCC
@boopsboops
boopsboops / features_cytb.tbl
Last active August 29, 2015 14:19
Feature table for cytb
>Feature BB-002_CYTB
1 >201 gene
gene CYTB
1 >201 CDS
product cytochrome b
codon_start 1
>Feature BB-003_CYTB
1 >201 gene
gene CYTB
1 >201 CDS
@boopsboops
boopsboops / features.tbl
Last active August 29, 2015 14:19
GenBank feature table for cytb and 16S
>Feature BB-002_CYTB
1 >201 gene
gene CYTB
1 >201 CDS
product cytochrome b
codon_start 1
>Feature BB-003_CYTB
1 >201 gene
gene CYTB
1 >201 CDS