Skip to content

Instantly share code, notes, and snippets.

View jrjhealey's full-sized avatar

Joe Healey jrjhealey

View GitHub Profile
# This script will calculate Shannon entropy from a MSA.
# Dependencies:
# Biopython, Matplotlib [optionally], Math
"""
Shannon's entropy equation (latex format):
H=-\sum_{i=1}^{M} P_i\,log_2\,P_i
Entropy is a measure of the uncertainty of a probability distribution (p1, ..... , pM)
@jrjhealey
jrjhealey / Genbank_slicer.py
Last active September 20, 2017 19:11
Creating subsetted operons/gene genbank files from a 'parent' sequence!
#!/usr/bin/python
# This script is designed to take a genbank file and 'slice out'/'subset'
# regions (genes/operons etc.) and produce a separate file. This can be
# done explicitly by telling the script which base sites to use, or can
# 'decide' for itself by blasting a fasta of the sequence you're inter-
# ed in against the Genbank you want to slice a record out of.
# Note, the script (obviously) does not preseve the index number of the
# bases from the original
@jrjhealey
jrjhealey / tabulateHHpred.py
Created July 5, 2017 21:06
Turn the output of HHsearch in to tab delimited text
# -*- coding: utf-8 -*-
"""
This script takes the .hhr files output by HHSuite and
turns the quite verbose file in to a fully tabulated
version with all the fields separated one, one line per
file. Thus, the file can be viewed simply in Excel etc.
It requires the non-standard pandas module.
"""
@jrjhealey
jrjhealey / fastafetcher.py
Created September 13, 2017 21:47
Pull out fastas from a multifasta based on keyword search as a string or keyfile
# Extract fasta files by their descriptors stored in a separate file.
# Requires biopython
from Bio import SeqIO
import sys
import argparse
def getKeys(args):
"""Turns the input key file into a list. May be memory intensive."""
@jrjhealey
jrjhealey / MacMAC.sh
Last active September 20, 2017 10:55
Get wireless and ethernet MAC addresses.
#!/bin/bash
ethernet=$(ifconfig en0 | awk '/ether/{print $2}')
wifi=$(ifconfig en1 | awk '/ether/{print $2}')
echo "Ethernet MAC Address: $ethernet"
echo "Wifi MAC Address: $wifi"
@jrjhealey
jrjhealey / getPDB.sh
Last active April 26, 2018 10:59
Fetching PDB structures from the Protein Databank
#!/bin/bash
# Script to retrieve PDBs via the command line from the PDB HTTP/FTP
# Capture inputs
usage()
{
cat << EOF
usage: $0 options
# Getting python logging info output in a colour coded and customised manner.
# Stolen from M. Galardini and https://stackoverflow.com/questions/384076/how-can-i-color-python-logging-output/2532931#2532931
import logging
import sys
BLACK, RED, GREEN, YELLOW, BLUE, MAGENTA, CYAN, WHITE = range(8)
COLORS = {
'WARNING' : YELLOW,
@jrjhealey
jrjhealey / Fiddling_with_fastas.sh
Created November 27, 2017 14:13
A useful pure bash construct for dealing with FASTA files. Can be tweaked to perform all sorts of actions on the headers of sequences (e.g. rearrangement, regex, text matching)
#!/bin/bash
#### Print out all fastas ending in a certain string. ####
# Change *"$string" to *"$string"* to find containing,
# or "$string"* to find starts with.
file="$1"
string="$2"
@jrjhealey
jrjhealey / structfit.py
Created December 4, 2017 14:32
Connecting HHSuite to Chimera for visualitations of how similar 2 HMM PDBs are.
"""
This script pulls in homologs of proteins from PDB
as determined by HHSuite. It then employs pychimera and
UCSF Chimera to structurally match them and get an
indication of how well they score (RMSD) in order
to pick the best simulation.
"""
import os
import subprocess
import sys
@jrjhealey
jrjhealey / Iterative_R_images.r
Created January 30, 2018 17:39
Making a video from an R loop
#!/usr/bin/env Rscript
# Plotting surface plots via ggplot2/plotly
# Usage:
# $ Rscript CDmeltplot.R -i data.csv -o filename
############################################################
# General purpose heatmap plotting script for consistency. #
# This script can be slow as it was designed to be pretty #