Skip to content

Instantly share code, notes, and snippets.

View josephhughes's full-sized avatar
💭
Organising stuff in repos

Joseph Hughes josephhughes

💭
Organising stuff in repos
View GitHub Profile
@josephhughes
josephhughes / README
Last active September 26, 2015 22:17
Use this script to replace the stop codons with gaps in the nucleotide alignment
ReplaceStopsWithGaps.pl is a perlscript written by Joseph Hughes, University of Glasgow
use this to remove stop codons from an alignment
typically, this would be done to calculate dN/dS in HYPHY
Usage:
perl ../Scripts/ReplaceStopWithGaps.pl -pep 104D5_pep.fasta -nuc 104D5.fasta -output 104D5_nostop.fasta
use this to replace stop codons from the nucleotide alignment
the nucleotide and the peptide alignments are necessary
@josephhughes
josephhughes / README
Created August 24, 2011 09:46
Perl script to split a file containing multiple fastq into separate fast files names according to the fast ID
SplitFastq.pl is a perlscript written by Joseph Hughes, university of Glasgow
Usage:
perl SplitFastq.pl -in ourmultifastqfile
This script will split a file containing multiple fastq into separate fast files names according to the fast ID.
The script uses Bioperl.
@josephhughes
josephhughes / Clusters.txt
Created October 7, 2015 14:10
Generating a circular plot showing reassortment using the ETE toolkit
#Name Seg1 Seg2 Seg3 Seg4 Seg5 Seg6 Seg7 Seg8 Seg9 Seg10
1-8FRA2008-27 15 4 16 1 11 3 1 19 1 14
1-8FRA2008-28 15 4 16 1 11 3 1 19 1 14
1-8FRA2008-29 15 4 16 1 11 3 1 19 1 14
10RSArrrr-10 ? ?
11RSArrrr-11 15 6
12RSArrrr-12 ? ?
13RSArrrr-13 ? ?
14CAR1982-04 ? ? 1 1 1 ? ? 2 1 1
14POL2012-01 3 14 15 2 2 13 5 7 4 14
@josephhughes
josephhughes / parse_cdhit.pl
Created January 17, 2013 13:29
Use this script to get the number of reads in each cluster
# use this to get the number of reads in each cluster
use strict;
use Getopt::Long;
use Bio::SeqIO;
my ($clstr,$result,$long,%clusters,$infile);
&GetOptions(
'clstr:s' =>\$clstr, #a cd-hit generated cluster file
'out:s' => \$result, # a text file with the numbers of reads in each cluster
);
@josephhughes
josephhughes / dna2FreqAndDistMat
Last active December 20, 2015 01:59
Generate a set of unique sequences from a fasta file and return the frequency of each unique sequence and a pairwise distance of the sequences using the dist.dna function from ape. The default distance model used in "raw" but all models available in dist.dna can be specified. Sequences that have ? or - will be considered different.
require(ape)
dna2FreqAndDistMat<-function(dna,model=NULL){
if(is.null(model)){ model <- c("raw")}
#model must be one , "raw" is the default model
allowed_models<-c("raw", "N", "TS", "TV", "JC69", "K80", "F81", "K81", "F84", "BH87", "T92", "TN93", "GG95", "logdet", "paralin", "indel", "indelblock")
if(!any(allowed_models==model)){
warning("You need to provide the correct model: raw, N, TS, TV, JC69, K80, F81, K81, F84, BH87, T92, TN93, GG95, logdet, paralin, indel, indelblock")
return(NULL)
}
@josephhughes
josephhughes / ReplaceStopWithRefCodonGaps.pl
Created January 17, 2013 09:34
use this to remove stop codons from an alignment typically, this would be done to calculate dN/dS in HYPHY Usage: perl ../Scripts/ReplaceStopWithGaps.pl -pep 104D5_pep.fasta -nuc 104D5.fasta -output 104D5_nostop.fasta -ref 104D5S1 use this to replace stop codons from the nucleotide alignment with the codon of the reference the nucleotide and the…
#!/usr/bin/perl -w
#
# use this to remove stop codons from an alignment
# typically, this would be done to calculate dN/dS in HYPHY
# Usage: perl ../Scripts/ReplaceStopWithGaps.pl -pep 104D5_pep.fasta -nuc 104D5.fasta -output 104D5_nostop.fasta -ref 104D5S1
# use this to replace stop codons from the nucleotide alignment with the codon of the reference
# the nucleotide and the peptide alignments are necessary and the name of the reference sequence
# the reference sequence needs to be in the nucleotide alignment
#!/bin/bash
# ./alignScript.sh ref pair1 pair2 name
ref_name=$1
pair1=$2
pair2=$3
name=$4
bwa index $ref_name
bwa mem $ref_name $pair1 $pair2 > ${name}.sam
samtools sort ${name}.sam -o ${name}.bam
samtools index ${name}.bam
@josephhughes
josephhughes / pango_designation2json.py
Created November 25, 2022 03:28
Converting the pangolin lineage information into json
import json
import argparse
import csv
import sys
# provide as input
# 1) the curation notes (tsv) (more extensive thant lineage_notes.txt, which only has lineage and description)
# contains: Lineage Rough number of SNPs Example sequence Active/ Unobserved/ Inactive Designator Size (roughly) Description
# 2) full_alias_key.txt a file with the renames for the aliases (.txt): alias,lineage
# to do:
@josephhughes
josephhughes / RetrieveEmailFromPubmed
Created October 13, 2014 10:36
A perlscript to parse the email addressed from the affiliations in PubMed
#!/usr/bin/perl -w
# A perlscript written by Joseph Hughes, University of Glasgow
# use this perl script to parse the email addressed from the affiliations in PubMed
use strict;
use LWP::Simple;
my ($query,@queries);
#Query the Journal of Virology from 2014 until the present (use 3000)
$query = 'journal+of+virology[journal]+AND+2014[Date+-+Publication]:3000[Date+-+Publication]';