Skip to content

Instantly share code, notes, and snippets.

View sjcockell's full-sized avatar

Simon Cockell sjcockell

View GitHub Profile
library(tidyverse)
library(zoo)
read_csv('https://coronavirus.data.gov.uk/downloads/csv/coronavirus-cases_latest.csv') %>%
dplyr::filter(`Area type` == 'Upper tier local authority') %>%
dplyr::arrange(desc(`Area name`)) %>%
dplyr::group_by(desc(`Area name`)) %>%
dplyr::mutate(`Cases - 7 day rolling average` = zoo::rollmean(`Daily lab-confirmed cases`, k = 7, fill = NA)) %>%
dplyr::ungroup() %>%
dplyr::filter(`Area name` %in% c('Gateshead', 'Newcastle upon Tyne')) %>%
@sjcockell
sjcockell / get_sra_data.sh
Last active November 7, 2021 14:35
Script from Lockdown Learning #16 for downloading from SRA.
#!/bin/bash
VAR=$(tail -n +2 SraRunTable.txt | cut -d ',' -f 1)
# This is a loop for downloading the data
for i in ${VAR}
do
if [ -f ${i}.fastq.gz ]
then
@sjcockell
sjcockell / build_alignments.py
Created October 25, 2010 15:51
Build multiple sequence alignments from sub groups of proteins within a list file. Uses http://gist.github.com/329730, http://gist.github.com/644765 and muscle (http://www.drive5.com/muscle/downloads.htm)
import get_sequences
import uniprot_mapping
import urllib2
import shlex, subprocess
def main(file):
with open(file) as f:
data = f.read()
groups = data.split('"') #file has protein name per line with " delineating groups
groups = organise_groups(groups)
@sjcockell
sjcockell / get_sequence_from_name.py
Created October 25, 2010 10:51
Get a protein sequence from UniProt when you only have that protein's name, not its accession. Uses http://gist.github.com/329730.
import uniprot_mapping
import urllib2
def main(file):
fh = open(file, 'r')
for line in fh.readlines():
if not line.startswith('"'): #ignore comment lines
name = line.rstrip()
id = uniprot_mapping.uniprot_mapping('ACC+ID', 'ACC', name)
mapped = parse_return_string(id)
import urllib
import os, os.path
from optparse import OptionParser
def main(superfamily):
#fetch the list of domains in the superfamily from the CathDomainList
dom_lst = get_domain_list(superfamily)
#for each domain, retrieve the PDB file from CATH
get_domain_structures(dom_lst, superfamily)
@sjcockell
sjcockell / signalp.py
Created May 13, 2010 15:30
A short script for running SignalP over a directory of sequence files.
import os
from optparse import OptionParser
def main(file, path):
"""runs SignalP for every sequence file in a directory"""
for filename in os.listdir(file):
#other checks other than filer suffix would be more sophisticated
if filename.endswith('.fa') or filename.endswith('.fasta'):
sig = signalp(path, 'gram+', os.path.join(file, filename))
length = len(sig)
@sjcockell
sjcockell / uniprot_mapping.py
Created March 11, 2010 22:11
A Python method to map protein IDs via the UniProt mapping service
import urllib
import urllib2
def uniprot_mapping(fromtype, totype, identifier):
"""Takes an identifier, and types of identifier
(to and from), and calls the UniProt mapping service"""
base = 'http://www.uniprot.org'
tool = 'mapping'
params = {'from':fromtype,
'to':totype,
def python_gist():
"Does Posterous really support Gist drop-ins?"
print "Testing, testing 1,2,3..."