Skip to content

Instantly share code, notes, and snippets.

View avrilcoghlan's full-sized avatar

Avril Coghlan avrilcoghlan

View GitHub Profile
@avrilcoghlan
avrilcoghlan / reformat_paralogs_file.pl
Created March 4, 2022 13:52
Perl script to reformat the file of within-species paralogs into the format that my pipeline expects
#!/usr/bin/perl
$file = $ARGV[0]; # input file of within-species paralogs from BioMart
open(FILE,"$file") || die "ERROR: cannot open $file\n";
while(<FILE>)
{
$line = $_;
chomp $line;
@temp = split(/\t+/,$line);
# Genome project Gene stable ID Paralogue gene stable ID
@avrilcoghlan
avrilcoghlan / format_blastp_output_for_chembl_humanblast.py
Last active March 4, 2022 13:28
Python script to parse BLAST output from comparing ChEMBL proteins to human proteins
import os
import sys
from collections import defaultdict
import FiftyHG_Chembl
#====================================================================#
def main():
# find the blast output files:
@avrilcoghlan
avrilcoghlan / format_blastp_output_for_chembl_singleproteintargetsonly.py
Created March 4, 2022 11:33
Python script to filter BLAST hits to ChEMBL, to just take hits to single-protein targets:
import os
import sys
from collections import defaultdict
import FiftyHG_Chembl
#====================================================================#
def main():
# find the blast output files:
@avrilcoghlan
avrilcoghlan / format_blastp_output_for_chembl_besthitonly.py
Created March 4, 2022 11:19
Python script to just take the top ChEMBL hit for each query gene, and any hits with E-values within 1e+5 of it. Also, only take hits of E-value <= 1e-10:
import os
import sys
from collections import defaultdict
import FiftyHG_Chembl
#====================================================================#
def main():
# find the blast output files:
@avrilcoghlan
avrilcoghlan / format_blastp_output_for_chembl.py
Created March 4, 2022 10:54
Parse BLAST output against ChEMBL, to have the top hits for each query protein
import os
import sys
from collections import defaultdict
import FiftyHG_Chembl
#====================================================================#
def main():
# find the blast output files:
@avrilcoghlan
avrilcoghlan / retrieve_phenotypeinfo_from_wormbase_for_genelist.py
Created June 28, 2019 10:50
Script to use the WormBase REST API to retrieve phenotypes (from RNAi, mutants) for an input list of C.elegans genes
import os
import sys
import requests # this is used to access json files
#====================================================================#
# use the wormbase REST API to retrieve the phenotypes (from mutants, RNAi) for a particular gene:
def retrieve_phenotypes_from_wormbase(gene):
@avrilcoghlan
avrilcoghlan / retrieve_phenotypeinfo_from_wormbase.py
Created June 28, 2019 10:34
Example script to retrieve phenotype data for a C. elegans gene using the WormBase REST API
# script to retrieve the phenotype info for a particular gene from WormBase
import requests, sys
server = "http://rest.wormbase.org"
ext = "/rest/field/gene/WBGene00000079/phenotype"
r = requests.get(server+ext, headers={ "Content-Type" : "application/json", "Accept" : ""})
if not r.ok:
@avrilcoghlan
avrilcoghlan / retrieve_predictedtarget_info_from_chembl_for_compoundlist.py
Created June 27, 2019 09:34
Script to retrieve predicted targets from ChEMBL for an input list of ChEMBL compounds
import os
import sys
import pandas as pd # uses pandas python module to view and analyse data
import requests # this is used to access json files
#====================================================================#
# call the 'target prediction' API to find the predicted targets of our list of compounds:
def find_predicted_targets_of_compounds(cmpd_chembl_ids):
@avrilcoghlan
avrilcoghlan / retrieve_genetrees_from_wormbase_parasite.py
Created June 20, 2019 09:35
Retrieve, and parse, all the gene trees from WormBase ParaSite for a list of Schistosoma mansoni genes
import os
import sys
import requests # this is used to access json files
from ete3 import Phyloxml
import datetime
# Note: this script must be run in Python2 because ete3 uses Python2
#====================================================================#
@avrilcoghlan
avrilcoghlan / retrieve_smansoni_genelist_from_wormbase_parasite.py
Created June 19, 2019 09:25
Script to get a list of all Schistosoma mansoni protein-coding genes from WormBase ParaSite
# script to retrieve a list of all protein-coding Schistosoma mansoni genes from wormbase parasite
# example script taken from https://parasite.wormbase.org/rest-13/documentation/info/lookup_genome
import requests, sys
server = "https://parasite.wormbase.org"
ext = "/rest-13/lookup/genome/schistosoma_mansoni_prjea36577?biotypes=protein_coding"
# took the PRJEA from https://parasite.wormbase.org/Schistosoma_mansoni_prjea36577/Info/Index/
r = requests.get(server+ext, headers={ "Content-Type" : "application/json", "Accept" : ""})