Skip to content

Instantly share code, notes, and snippets.

View d0choa's full-sized avatar
👨‍💻
Open Targets Platform Coordinator

David Ochoa d0choa

👨‍💻
Open Targets Platform Coordinator
View GitHub Profile
@d0choa
d0choa / non_STY_sites
Created January 30, 2017 16:39
List of non Serine, Threonine and Tyrosine sites. Some redundancies in Ensembl isoforms. Peptide information, publication of origin and other technical details available on request
ensp,position,residue
ENSP00000367263,1441,M
ENSP00000367263,1572,M
ENSP00000367263,5030,M
ENSP00000367263,5550,M
ENSP00000350990,1428,M
ENSP00000437271,1428,M
ENSP00000350990,1443,C
ENSP00000437271,1443,C
ENSP00000263801,551,M
@d0choa
d0choa / getJournalsFreq.R
Created March 11, 2016 11:25
This R script displays the number of manuscripts in each journal stored in the Mekentosj's Papers database during the last n years
## getJournalsFreq.R
## author: David Ochoa <dogcaesar@gmail.com>
## The script displays the number of manuscripts in each journal stored in the Mekentosj's Papers database
topN <- 20 #Top journals to display
lastyears <- 5 #Last n years
journalsquery <- "osascript -ss -e 'tell application \"Papers\" to get bundle name of publication items'"
yearsquery <- "osascript -e 'tell application \"Papers\" to get publication year of publication items'"
@d0choa
d0choa / efo_otar_mapping.py
Last active June 9, 2020 11:42
EFO-OTAR mappings prototype
import rdflib
import pandas as pd
## Path to efo local or remote
owlpath = "https://github.com/EBISPOT/efo/releases/download/v3.18.0/efo_otar_slim.owl"
## File downloaded from
## https://docs.google.com/spreadsheets/d/1CV_shXJy1ACM09HZBB_-3Nl6l_dfkrA26elMAF0ttHs/edit
mappingFile = "/Users/ochoa/Downloads/OTAR project EFO mappings for disease profile pages - Sheet1-2.csv"
intraneturl = "http://home.opentargets.org/"
outputFile = "/Users/ochoa/Desktop/output.json"
@d0choa
d0choa / etl_metrics.py
Last active March 15, 2021 11:00
Some prototype of metrics
import argparse
import array
import struct
import pyspark.sql.functions as F
from pyspark import SparkConf
from pyspark.sql import SparkSession
from pyspark.sql import DataFrame, StructType, ArrayType, StringType
from typing import Iterable
from functools import reduce
@d0choa
d0choa / knownDrugsForTargetGQL.py
Last active June 4, 2021 16:09
Querying the known drugs for a given target from Open Targets Platform GraphQL API
#!/usr/bin/env python3
# Import relevant libraries for HTTP request and JSON formatting
import requests
import json
# Set gene_id variable
gene_id = "ENSG00000244734"
# Build query string
@d0choa
d0choa / suggestOutdatedUKBBTraitMappings.py
Last active June 4, 2021 17:50
Suggest UKBB trait mappings that could potentially be updated. Dependency: using external mapping tool
from pyspark.sql import SparkSession
from pyspark.sql import Window as W
from pyspark.sql import functions as F
# genetics portal studies
studiesPath = "/Users/ochoa/Datasets/study-index/"
failedPath = "/Users/ochoa/Datasets/failedEvidence"
evidenceFailedPath = "/Users/ochoa/Datasets/evidenceFailed"
diseasePath = "/Users/ochoa/Datasets/diseases"
@d0choa
d0choa / newFingenSignals.py
Last active June 21, 2021 15:48
Interesting (validated by drug development) L2G signals found in Finngen but not available in previous sources. DISCLAIMER: No ontology expansion
import pyspark.sql.functions as F
from pyspark import SparkConf
from pyspark.sql import SparkSession
sparkConf = SparkConf()
spark = (
SparkSession.builder
.config(conf=sparkConf)
.master('local[*]')
@d0choa
d0choa / nonObviousTermSimilaritiesBasedOnW2V.py
Last active August 5, 2021 11:24
Find W2V similarities between terms in different EFO branches (e.g. disease. vs phenotype)
from pyspark.sql import SparkSession
from pyspark.context import SparkContext
from pyspark.sql import functions as F
from pyspark.ml.feature import Word2VecModel
from pyspark.sql.types import DoubleType
from pyspark.ml.feature import Normalizer
# establish spark connection
spark = (
SparkSession.builder
@d0choa
d0choa / nod2AF.py
Last active August 11, 2021 10:41
Overlay variants in NOD2 alphafold model
from pymol import cmd
cmd.load("/Users/ochoa/Downloads/AF-Q9HC29-F1-model_v1.cif")
cmd.bg_color("grey95")
cmd.set_color("veryHigh", [0, 83, 214])
cmd.set_color("confident", [101, 203, 243])
cmd.set_color("low", [255, 219, 19])
cmd.set_color("veryLow", [255, 125, 69])
@d0choa
d0choa / NOD2_variants.py
Last active August 27, 2021 15:32
Pathogenic or L2G significant variants in L2G
import pyspark.sql.functions as F
from pyspark import SparkConf
from pyspark.sql import SparkSession
sparkConf = SparkConf()
sparkConf = sparkConf.set('spark.hadoop.fs.gs.requester.pays.mode', 'AUTO')
sparkConf = sparkConf.set('spark.hadoop.fs.gs.requester.pays.project.id', 'open-targets-eu-dev')
spark = (
SparkSession.builder