Skip to content

Instantly share code, notes, and snippets.

View avrilcoghlan's full-sized avatar

Avril Coghlan avrilcoghlan

View GitHub Profile
@avrilcoghlan
avrilcoghlan / merge_optical_map_xml_files.py
Last active October 8, 2022 06:49
Python script for merging optical map xml files (for different scaffolds) into one large xml file
import sys
import os
from xml.etree import ElementTree as ET
import AvrilFileUtils
class Error (Exception): pass
#====================================================================#
# define a function to merge optical map xml files for different scaffolds.
@avrilcoghlan
avrilcoghlan / calc_dists_to_top_of_GO_using_bfs.py
Last active January 4, 2016 20:09
Python script to calculate the number of steps from a GO term to the top of the GO hierarchy, using breadth-first search
import sys
import os
from collections import defaultdict
class Error (Exception): pass
#====================================================================#
# define a function to read in the ancestors of each GO term in the GO hierarchy:
@avrilcoghlan
avrilcoghlan / calc_dist_to_top_of_GO.py
Created January 28, 2014 17:05
Python script to calculate the number of steps from a GO term to the top of the GO hierarchy, using Dijkstra's algorithm
import sys
import os
from collections import defaultdict
from scipy.sparse import lil_matrix # needed for Dijkstra's algorithm
from scipy.sparse.csgraph import dijkstra # needed for Dijkstra's algorithm
class Error (Exception): pass
#====================================================================#
@avrilcoghlan
avrilcoghlan / tree_traversal.py
Last active April 4, 2018 07:24
Python script for depth-first search and breadth-first search of a simple tree
def DFS_dist_from_node(query_node, parents):
"""Return dictionary containing distances of parent GO nodes from the query"""
result = {}
stack = []
stack.append( (query_node, 0) )
while len(stack) > 0:
print("stack=", stack)
node, dist = stack.pop()
result[node] = dist
@avrilcoghlan
avrilcoghlan / AvrilHMMUtils.pm
Created January 16, 2014 16:51
AvrilHMMUtils.pm
package HelminthGenomeAnalysis::AvrilHMMUtils;
use strict;
use warnings;
use Bio::Seq;
use Bio::SeqIO;
use Moose;
use Math::Round; # HAS THE nearest() FUNCTION
use Carp::Assert; # HAS THE assert() FUNCTION
use Scalar::Util qw(looks_like_number);
@avrilcoghlan
avrilcoghlan / parse_hmmpfam_output.pl
Created January 16, 2014 16:49
Perl script that takes a hmmpfam output file, and counts the number of queries that have hits with evalue <= evalue_cutoff, and number of HMMs hit with evalue <= evalue_cutoff.
#!/usr/bin/env perl
=head1 NAME
parse_hmmpfam_output.pl
=head1 SYNOPSIS
parse_hmmpfam_output.pl hmmpfam protein_fasta evalue_cutoff total_num_hmms cegma_dir
where hmmpfam is the hmmpfam output file,
@avrilcoghlan
avrilcoghlan / exercise8_compara.pl
Created December 18, 2013 11:49
Perl script that uses the Ensembl Compara Perl API to count the number of “one2one” orthologues between human and mouse
#!/usr/bin/env perl
# Count the number of “one2one” orthologues between human and mouse
# Note: this script gives a warning about some variable declaration within the Compara API.
use strict;
use warnings;
use Bio::EnsEMBL::Registry;
my $registry = 'Bio::EnsEMBL::Registry';
@avrilcoghlan
avrilcoghlan / exercise7_compara.pl
Created December 18, 2013 11:28
Perl script that uses the Ensembl Compara API to get all the homologues for the human gene ENSG00000229314
#!/usr/bin/env perl
# Get all the homologues for the human gene ENSG00000229314
# Note: get a warning about a variable declaration in the Compara API when this script is run.
use strict;
use warnings;
use Bio::EnsEMBL::Registry;
my $registry = 'Bio::EnsEMBL::Registry';
@avrilcoghlan
avrilcoghlan / exercise6_compara.pl
Created December 18, 2013 11:20
Perl script that uses the Ensembl Compara Perl API to print all the members of the tree containing the human ncRNA gene ENSG00000238344
#!/usr/bin/env perl
# Print all the members of the tree containing the human ncRNA gene ENSG00000238344
use strict;
use warnings;
use Bio::EnsEMBL::Registry;
my $registry = 'Bio::EnsEMBL::Registry';
@avrilcoghlan
avrilcoghlan / exercise5_compara.pl
Last active December 31, 2015 14:59
Perl script that uses the Ensembl Compara Perl API to print the protein tree with the stable id ENSGT00390000003602
#!/usr/bin/env perl
# Print the protein tree with the stable id ENSGT00390000003602
use strict;
use warnings;
use Bio::EnsEMBL::Registry;
my $registry = 'Bio::EnsEMBL::Registry';