Skip to content

Instantly share code, notes, and snippets.

View d0choa's full-sized avatar
👨‍💻
Open Targets Platform Coordinator

David Ochoa d0choa

👨‍💻
Open Targets Platform Coordinator
View GitHub Profile
@d0choa
d0choa / targetMissenseVariants.py
Last active September 15, 2021 17:24
Target specific (NOD2) coding variants (GWAS + ClinVar pathogenic)
import pyspark.sql.functions as F
from pyspark import SparkConf
from pyspark.sql import SparkSession
sparkConf = SparkConf()
spark = (
SparkSession.builder
.config(conf=sparkConf)
@d0choa
d0choa / nod2AF.py
Last active August 11, 2021 10:41
Overlay variants in NOD2 alphafold model
from pymol import cmd
cmd.load("/Users/ochoa/Downloads/AF-Q9HC29-F1-model_v1.cif")
cmd.bg_color("grey95")
cmd.set_color("veryHigh", [0, 83, 214])
cmd.set_color("confident", [101, 203, 243])
cmd.set_color("low", [255, 219, 19])
cmd.set_color("veryLow", [255, 125, 69])
@d0choa
d0choa / otAnalytics.r
Last active September 29, 2021 15:37
Open Targets web accesses retrieved from Google Analytics API
library("tidyverse")
library("googleAnalyticsR")
library("zoo")
library("lubridate")
date_range <- c("2019-01-01", as.character(Sys.Date() - 1))
# Authorisation
ga_auth()
@d0choa
d0choa / literatureFailedAndMatches.py
Last active August 27, 2021 19:45
Missing terms in the ontology (failed grounded entities) and the most frequently occurring terms on the same publications
import pyspark.sql.functions as F
from pyspark import SparkConf
from pyspark.sql import SparkSession
from pyspark.sql.window import Window
sparkConf = SparkConf()
sparkConf = sparkConf.set('spark.hadoop.fs.gs.requester.pays.mode', 'AUTO')
sparkConf = sparkConf.set('spark.hadoop.fs.gs.requester.pays.project.id',
'open-targets-eu-dev')
@d0choa
d0choa / metrics_comparison.R
Created October 6, 2021 14:28
Compare AUCs and OR for 2 pipeline runs
library(tidyverse)
library(cowplot)
library(ggrepel)
df <- bind_rows(
read_csv("~/Projects/ot-release-metrics/data/21.06.5.csv"),
read_csv("~/Projects/ot-release-metrics/data/21.09.2.csv")
)
df %>%
@d0choa
d0choa / metadata_SEO
Last active November 25, 2021 13:49
Some possible relevant fields for SEO
import pyspark.sql.functions as F
from pyspark import SparkConf
from pyspark.sql import SparkSession
sparkConf = SparkConf()
sparkConf = sparkConf.set('spark.hadoop.fs.gs.requester.pays.mode', 'AUTO')
sparkConf = sparkConf.set('spark.hadoop.fs.gs.requester.pays.project.id',
'open-targets-eu-dev')
# establish spark connection
@d0choa
d0choa / all_variants_for_genelist.Rmd
Last active November 23, 2022 20:01
All platform variants associated with a list of genes in R
---
title: "Batch-query all platform evidence associated with a gene/target list (R)"
output:
md_document:
variant: markdown_github
---
How to batch-access information related to a list of targets from the Open Targets Platform is a recurrent question. Here, I provide an example on how to access all target-disease evidence for a set of IFN-gamma signalling related proteins. I will further reduce the evidence to focus on all the coding or non-coding variants clinically-associated with the gene list of interest. I used R and sparklyr, but a Python implementation would be very similar. The platform documentation and the community space have very similar examples.
@d0choa
d0choa / 2021_approvals.R
Last active July 12, 2022 01:48
Supporting evidence on 2021 FDA approvals
library("tidyverse")
library("sparklyr")
library("sparklyr.nested")
library("cowplot")
library("ggsci")
#Spark config
config <- spark_config()
# Allowing to GCP datasets access
@d0choa
d0choa / missingTopLoci.py
Last active March 3, 2022 15:17
Diagnostic script to find and explain missing top loci from the V2D dataset
import pyspark.sql.functions as F
from pyspark import SparkConf
from pyspark.sql import SparkSession
sparkConf = SparkConf()
sparkConf = sparkConf.set('spark.hadoop.fs.gs.requester.pays.mode', 'AUTO')
sparkConf = sparkConf.set('spark.hadoop.fs.gs.requester.pays.project.id',
'open-targets-eu-dev')
# establish spark connection
@d0choa
d0choa / potentialNewVariantsInIIndex.py
Last active March 16, 2022 16:01
List of potential new variants in variant index (derived from other datasets)
from os import sep
import pyspark.sql.functions as F
from pyspark import SparkConf
from pyspark.sql import SparkSession
sparkConf = SparkConf()
sparkConf = sparkConf.set('spark.hadoop.fs.gs.requester.pays.mode', 'AUTO')
sparkConf = sparkConf.set('spark.hadoop.fs.gs.requester.pays.project.id',
'open-targets-eu-dev')