Skip to content

Instantly share code, notes, and snippets.

Avatar

Ricardo Avila ravila4

View GitHub Profile
@ravila4
ravila4 / count_taxids.sh
Last active Feb 23, 2021
A demo of converting API responses to CSV format with JQ.
View count_taxids.sh
#!/bin/bash
# Genesets aggregated by taxid
aggs=`curl -s "https://mygeneset.info/v1/query?q=*&facets=taxid&facet_size=100"`
taxids=`echo $aggs | jq -r '.facets.taxid.terms | map(.term) | @csv'`
counts=`echo $aggs | jq -r '.facets.taxid.terms | map(.count) | @csv'`
# Query scientific name for each taxid
resp=`curl -s -X POST -d "q=${taxids}" "http://t.biothings.io/v1/query"`
species=`echo $resp | jq -r 'map(.scientific_name) | @csv'`
View gist:9aacae443c50a168b4267fca7448d88b
#!/usr/bin/env bash
set -Eeuo pipefail
cd "$(dirname "${BASH_SOURCE[0]}")" >/dev/null 2>&1
trap cleanup SIGINT SIGTERM ERR EXIT
usage() {
cat <<EOF
View id_mapping_mygene.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ravila4
ravila4 / databricks-programming-guide.md
Created Jan 27, 2020
Databricks Programming Guidance
View databricks-programming-guide.md

Databricks Programming Guidance

This document contains lessons learned with regard to Databricks programming, but also contains some best practices

Mapping to a Azure Data Lake Generation 2

blobname = "miraw"  
storageaccount = "rdmidlgen2"  
mountname = "/rdmi"

configs = {"fs.azure.account.auth.type": "OAuth",
@ravila4
ravila4 / align.py
Created Jan 11, 2020
Sequence alignment using PyMOL
View align.py
#!/usr/bin/env python
# Sequence alignment using PyMOL
# The purpose of this script is to generate a sequence alignment between
# the original crystal structure of the apo and holo models, and the sequence
# of the finalised, ungapped Rosetta models. This allows us to get a 1 to 1
# corresponcence between the residue numberings in both structures.
# USAGE: Run once from the project root.
# "pockets.csv" contains the information about apo holo pairs.
@ravila4
ravila4 / HTS_gaussian.ipynb
Created Oct 24, 2019
Fitting Gaussian curves to histograms
View HTS_gaussian.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ravila4
ravila4 / flatten_json.py
Created Sep 12, 2019
Recursive function for flattening JSON.
View flatten_json.py
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
@ravila4
ravila4 / pandas_snippets.py
Created Aug 31, 2019
Chris's useful pandas snippets.
View pandas_snippets.py
# List unique values in a DataFrame column
pd.unique(df.column_name.ravel())
# Convert Series datatype to numeric, getting rid of any non-numeric values
df['col'] = df['col'].astype(str).convert_objects(convert_numeric=True)
# Grab DataFrame rows where column has certain values
valuelist = ['value1', 'value2', 'value3']
df = df[df.column.isin(valuelist)]
View csv_to_fasta.py
#!/usr/bin/env python
import pandas as pd
import click
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO
@click.command()
View make_blast_db.sh
#!/bin/bash
TYPE=${TYPE:-prot}
[[ ! -z ${1} ]] && INFILE=${1} || exit 1
shift
makeblastdb -in ${INFILE} -dbtype ${TYPE} -parse_seqids ${@} -blastdb_version 5