Skip to content

Instantly share code, notes, and snippets.

View ravila4's full-sized avatar

Ricardo Avila ravila4

View GitHub Profile
@ravila4
ravila4 / parse_drugbank_xml.py
Created March 8, 2019 04:03
Python script for parsing an xml database dump from DrugBank for extracting Log P values
import xmltodict
import pandas as pd
with open("full_database.xml") as db:
doc = xmltodict.parse(db.read())
values = []
for item in doc['drugbank']['drug']:
logp = None
try:
@ravila4
ravila4 / align.py
Created January 11, 2020 20:14
Sequence alignment using PyMOL
#!/usr/bin/env python
# Sequence alignment using PyMOL
# The purpose of this script is to generate a sequence alignment between
# the original crystal structure of the apo and holo models, and the sequence
# of the finalised, ungapped Rosetta models. This allows us to get a 1 to 1
# corresponcence between the residue numberings in both structures.
# USAGE: Run once from the project root.
# "pockets.csv" contains the information about apo holo pairs.
@ravila4
ravila4 / HTS_gaussian.ipynb
Created October 24, 2019 20:38
Fitting Gaussian curves to histograms
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ravila4
ravila4 / flatten_json.py
Created September 12, 2019 14:36
Recursive function for flattening JSON.
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
@ravila4
ravila4 / parallel.py
Last active September 11, 2019 18:40
Functions for parallelizing things
# Functions for parallelizing things
def init_spark(nproc=-1, appname="sparksession"):
"""Function to start a Spark executor."""
from pyspark.sql import SparkSession
if nproc == -1:
# Use all CUPs
spark = SparkSession.builder.master(
"local[*]").appName(appname).getOrCreate()
else:
@ravila4
ravila4 / pandas_snippets.py
Created August 31, 2019 15:56
Chris's useful pandas snippets.
# List unique values in a DataFrame column
pd.unique(df.column_name.ravel())
# Convert Series datatype to numeric, getting rid of any non-numeric values
df['col'] = df['col'].astype(str).convert_objects(convert_numeric=True)
# Grab DataFrame rows where column has certain values
valuelist = ['value1', 'value2', 'value3']
df = df[df.column.isin(valuelist)]
#!/usr/bin/env python
import pandas as pd
import click
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO
@click.command()
#!/bin/bash
TYPE=${TYPE:-prot}
[[ ! -z ${1} ]] && INFILE=${1} || exit 1
shift
makeblastdb -in ${INFILE} -dbtype ${TYPE} -parse_seqids ${@} -blastdb_version 5
blastp -db fasta.fa -query database.fa \
-outfmt "6 std stitle qcovs" -num_threads 10 -out out.blast
@ravila4
ravila4 / .tmux.conf
Created August 5, 2019 18:26
Tmux configuration
# $Id: vim-keys.conf,v 1.2 2010-09-18 09:36:15 nicm Exp $
#
# vim-keys.conf, v1.2 2010/09/12
#
# By Daniel Thau. Public domain.
#
# This configuration file binds many vi- and vim-like bindings to the
# appropriate tmux key bindings. Note that for many key bindings there is no
# tmux analogue. This is intended for tmux 1.3, which handles pane selection
# differently from the previous versions