Skip to content

Instantly share code, notes, and snippets.

Avril Coghlan avrilcoghlan

Block or report user

Report or block avrilcoghlan

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@avrilcoghlan
avrilcoghlan / pdb_rest_example_get_pbids_with_ligand.py
Created Jun 17, 2019
Use the PDB REST API to get a list of PDB entries that contain a particular ligand
View pdb_rest_example_get_pbids_with_ligand.py
#!/usr/bin/env python
# adapted example from https://github.com/PDBeurope/PDBe_Programming/blob/master/REST_API/snippets/basic_get_post.py
# edited to use the python 'requests' module
import argparse
import sys
import requests # this is used to access json files
PY3 = sys.version > '3'
@avrilcoghlan
avrilcoghlan / retrieve_bioactivity_info_from_chembl.py
Created May 30, 2019
Python script to query the ChEMBL database to retrieve a list of compounds with bioactivities for certain target proteins, and then retrieve information on the molecular properties of those compounds
View retrieve_bioactivity_info_from_chembl.py
import pandas as pd # uses pandas python module to view and analyse data
import requests # this is used to access json files
#====================================================================#
# using a list of known targets, find compounds that are active on these targets:
def find_bioactivities_for_targets(targets):
targets = ",".join(targets) # join the targets into a suitable string to fulfil the search conditions of the ChEMBL api
@avrilcoghlan
avrilcoghlan / Find_compounds_for_NTD_targets_and_filter_those_compounds.ipynb
Created May 29, 2019
Python notebook to query ChEMBL, to retrieve compounds with bioactivities for certain targets, and obtain properties of those compounds
View Find_compounds_for_NTD_targets_and_filter_those_compounds.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@avrilcoghlan
avrilcoghlan / fix_embl_file_cds_error.py
Created Nov 28, 2018
Script to correct EMBL files so that CDSs with >50% internal Ns are marked as /pseudo
View fix_embl_file_cds_error.py
import sys
import os
from collections import defaultdict
#====================================================================#
# read in the line numbers of dodgy CDS:
def read_error_file(error_file):
@avrilcoghlan
avrilcoghlan / list_dogdy_genes_in_embl_file.py
Created Nov 28, 2018
Script to make a list of dodgy genes in an EMBL file, based on the VAL_ERROR.txt file produced by the ENA's file validator
View list_dogdy_genes_in_embl_file.py
import sys
import os
from collections import defaultdict
#====================================================================#
# read in the set of lines with dodgy genes on them in the embl file:
def read_lines_with_dodgy_genes(input_error_file):
@avrilcoghlan
avrilcoghlan / fix_embl_file.py
Created Nov 28, 2018
Script to mark some genes in an EMBL file as /pseudo
View fix_embl_file.py
import sys
import os
from collections import defaultdict
#====================================================================#
# define a function to read in the genes that are in families, for our species of interest:
def find_genes_in_families(families_file, our_species_name, locus_tag):
"""read in the genes that are in families, for our species of interest """
@avrilcoghlan
avrilcoghlan / find_internal_stops.pl
Created Nov 28, 2018
Script to find protein sequences with internal stop codons
View find_internal_stops.pl
#!/usr/bin/env perl
=head1 NAME
find_internal_stops.pl
=head1 SYNOPSIS
find_internal_stops.pl input_fasta
where input_fasta is the input fasta file of protein translations.
@avrilcoghlan
avrilcoghlan / submit_crispresso_jobs_for_subsetsoffastq.py
Created Oct 26, 2018
Script to run CRISPResso jobs on a farm, for lots of subsets of data
View submit_crispresso_jobs_for_subsetsoffastq.py
import os
import sys
#====================================================================#
def submit_crispresso_jobs(sample_name, num_subsets):
# need to submit a crispresso job for each subset of the data:
for x in range(num_subsets):
subset = x + 1 # eg. if num_subsets is 17, 'subset' goes from 1 to 17
@avrilcoghlan
avrilcoghlan / filter_fastq_files_using_trimmomatic.py
Created Oct 26, 2018
Script to run Trimmomatic to discard read-pairs that have low quality bases
View filter_fastq_files_using_trimmomatic.py
import os
import sys
#====================================================================#
def run_trimmomatic_for_subsets_of_data(sample_name, num_subsets):
# need to run trimmomatic for each subset of the data:
for x in range(num_subsets):
subset = x + 1 # eg. if num_subsets is 17, 'subset' goes from 1 to 17
@avrilcoghlan
avrilcoghlan / split_up_fastq.py
Created Oct 26, 2018
Script to split up a gzipped fastq file into smaller gzipped fastq files of 1 million reads each
View split_up_fastq.py
import sys
import os
import gzip
from collections import defaultdict
#====================================================================#
# now read in the input fastq and split it up:
def read_fastq_file_and_split(input_fastq_file, seqs_per_output_file, output_file_prefix):
You can’t perform that action at this time.