Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

View cathalgarvey's full-sized avatar

Cathal Garvey cathalgarvey

View GitHub Profile
@cathalgarvey
cathalgarvey / RestrictionEnzymes.json
Created July 12, 2012 09:39
A somewhat exhaustive collection of restriction enzymes in JSON format.
This file has been truncated, but you can view the full file.
{
"binsi": {
"target_site": "CCWGG",
"name": "BinSI",
"suppliers": [],
"source": "ATCC 15702",
"references": [
"Khosaka, T., Kiwaki, M., Rak, B., (1983) FEBS Lett., vol. 163, pp. 170-174."
],
"prototype": "EcoRII",
@cathalgarvey
cathalgarvey / EcoliK12_OptimalCodons
Created July 12, 2012 20:02
E.coli K-12 Optimal Table, a compilation of information from (Welch et al, Sept 2009) and K12 genomic codon frequencies.
{
"End": {
"TAG": {
"frequency": 0.0,
"relfreq": 0.0
},
"localfrequency": 2.74,
"TGA": {
"frequency": 0.98,
"relfreq": 0.3576642335766423
@cathalgarvey
cathalgarvey / countfiles
Last active December 10, 2015 22:38 — forked from anonymous/countfiles
A little script I wrote to perform a quick census on my music library, to help me identify which artists/albums contain the most mp3s/m4as. This was intended to help me replace my music library with oggs, but could be used for all sorts of other handy things, too.
#!/usr/bin/env python3
import os
from sys import argv
# Walk through folders recursively, list the full path and number of (extension) files found in each.
basefolder = os.path.expanduser(argv[1])
filetype = str(argv[2]).lower()
output = []
@cathalgarvey
cathalgarvey / wordsoupfixer.py
Created January 20, 2013 18:39
Word soup fixer, for emails written in one long line full of ellipses. I get a surprising number of these.
#!/usr/bin/env python3
import sys
fixfile = sys.argv[1]
with open(fixfile) as InputFile:
word_soup = InputFile.read()
# Strip off excess whitespace and any trailing ellipsis.
word_soup = word_soup.strip().strip(".!?")
@cathalgarvey
cathalgarvey / cat_tweets
Created April 14, 2013 13:06
A script to concatenate the monthly JSON twitter files given in Twitter's tweets archive, and to add a UTC Unix timestamp to each tweet for easy parsing with other tools.
#!/usr/bin/env python3
import time
import datetime
import os
import json
timestamp_format = '%a %b %d %H:%M:%S %z %Y'
def twitter_timestamp_to_obj(time_string):
'Returns a timezone-aware datetime object.'
return datetime.datetime.strptime(time_string, timestamp_format)
@cathalgarvey
cathalgarvey / grep_tweets
Created April 14, 2013 20:06
grep_tweets, a companion script to cat_tweets that allows searching and filtering of Twitter tweet archive data by regex or a bunch of other useful parameters.
#!/usr/bin/env python3
import time
import datetime
import json
import re
timestamp_format = '%a %b %d %H:%M:%S %z %Y'
def twitter_timestamp_to_obj(time_string):
'Returns a timezone-aware datetime object.'
return datetime.datetime.strptime(time_string, timestamp_format)
@cathalgarvey
cathalgarvey / VersionedDict
Created September 21, 2013 23:10
A revision-enabled dict subclass, so your dicts don't forget their prior entries.
class VersionedDict(dict):
'''A dictionary sublcass that remembers all or a defined number of prior entries for a key.
Allows reversion by number from "head" or by absolute reference in revision list.
Allows retrieval of currently retained revision history for a key.
Deletion deletes all revisions, not merely the most recent.
If instantiated with the "revisions" keyword and an integer argument, only retains that many revisions per entry.'''
def __init__(self, *args, **kwargs):
revisions = kwargs.pop('revisions', None)
self._allowed_revisions = abs(int(revisions))
@cathalgarvey
cathalgarvey / seqio_answers.py
Created March 31, 2014 19:02
Suggested Solutions to SeqIO Exercises
from Bio import SeqIO
from Bio.Seq import Seq
sequence_generator = SeqIO.parse("br_sequences.fasta", "fasta")
all_sequences = list(sequence_generator)
# * How many records are in the file?
print("Number of records:", len(all_sequences))
# * How many records have a sequence of length 249?
@cathalgarvey
cathalgarvey / bioinfo_funcs.py
Created March 31, 2014 19:11
Features missing from Python's string/list types that are handy for bio-informatics
"Functions missing from Python's string/list types that are handy for bio-informatics."
def codonise(seq):
'''Returns a list of codons, not including trailing 1/2n.
To get codons starting from letter X, pass seq[X:].'''
mylist = []
for i in range(0, len(seq), 3):
this_codon = seq[i:i+3]
# This bit ensures that only whole codons,
# not trailing bits, are added:
@cathalgarvey
cathalgarvey / seq_searcher
Created March 31, 2014 19:14
Simplified method of searching for forward/reverse-complement sequence in a multi-sequence file
import sys
from Bio import SeqIO
from Bio.Seq import Seq
filename = sys.argv[1]
usersequence = Seq(sys.argv[2])
usersequence = usersequence.upper()
user_reverse = usersequence.reverse_complement()
records = SeqIO.parse(filename, "fasta")