Skip to content

Instantly share code, notes, and snippets.

View callahantiff's full-sized avatar
🦈
beep bop booping

Tiffany J. Callahan callahantiff

🦈
beep bop booping
View GitHub Profile
@callahantiff
callahantiff / PheKnowLator_Tutorial_EntitySearch.md
Last active October 5, 2022 20:54
PheKnowLator Tutorial -- Entity Search

image

Tutorial: Entity Search

In order to run the Entity Search you will need to download and run the

To work with the Notebook.

@callahantiff
callahantiff / derpyHatDrinkingDerby.md
Last active May 7, 2022 21:22
Derpy Hat Drinking Derby

Derpy Hat Drinking Derby

Derby Master Dan's Race Rules

  1. Everyone who plays must pick a figurine aka your "thristy jockey".
  2. Each drink you finish advances you one spot, but consuming a drink in manner in which impresses the Derby Master can earn you up to 3 spots.
  3. For each lap you complete, you can select a different jockey to be your riding buddy. Riding buddies must drink their own drinks in additional to each drink their buddy completes.
  4. If you get lapped, you must take a shot of the fastest jockey's choosing.
@callahantiff
callahantiff / _README.md
Last active January 28, 2023 22:30
A simple pipeline for characterizing patient representations

Characterizing Patient Representations


Purpose

Unlike other fields which perform comprehensive diagnostics or characterization prior to analysis, the evaluation of patient representation learning methods has largely been limited to the context of a specific downstream use case and is usually performed as part of model interpretation. While it cannot solve all of the aforementioned challenges, data-driven characterization of patient representations, independent of model development, may provide invaluable and unexpected insight and is an important first step towards understanding if these methods can be used to help automate CP development. To this end, we sought to answer the following questions:

RSQ1: What combinations of data type and sampling window create the best patient representations and does performance differ by disease group?

RSQ2: How does data-driven characterization of patient representation impact the explainability of downstream tasks like clustering?

To address these que

@callahantiff
callahantiff / PheKnowLator_PE_Evaluation.py
Created January 29, 2022 20:48
PheKnowLator Preeclampsia Evaluation
# import needed libraries
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')
import matplotlib.patches as mpatches
import numpy as np
import pandas as pd
import pickle
import random
from sklearn.manifold import TSNE
@callahantiff
callahantiff / repairing_pkt_metadata_files.py
Created October 13, 2021 18:47
PheKnowLator - repairing metadata files
# Script Purpose: Script was build to address https://github.com/callahantiff/PheKnowLator/issues/116
# import needed libraries
import json
import os
import re
import pandas as pd
import pickle
import shutil
@callahantiff
callahantiff / PatientSimilarity_ControlPatients.sql
Last active March 7, 2022 15:15
PatientSimilarity: Exploring the Impact of different Entities and Domains for Rare Disease Phenotyping.
-- RARE DISEASE - RANDOM PATIENTS: Query is designed to retrieve on 10,000 random patients
-- The query seraches only among patients having >9 visits
-- Lasted edited on: 05/04/2018
SELECT v.person_id, count(v.visit_occurrence_id) AS count, p_dat.cond, p_dat.cond_num
FROM CHCO_DeID_Apr2018.visit_occurrence v
RIGHT JOIN
(SELECT person_id, 'Rand' AS cond, 99 AS cond_num FROM CHCO_DeID_Apr2018.person
WHERE person_id NOT IN
(SELECT v.person_id
FROM CHCO_DeID_Apr2018.visit_occurrence v
@callahantiff
callahantiff / composite_patient_similarity.py
Last active August 15, 2022 07:53
Composite Patient Similarity Algorithm for Semi-Supervised Rare Disease Phenotyping. Additional details can be found here: https://mor.nlm.nih.gov/pubs/alum/2017-callahan.pdf
#########################################################################################################
# 2017 NLM Summer Medical Informatics Internship
# Purpose: queries a Google BigQuery Database and returns a vector of values for a set of patients
# version 1.1.0
# date: 08.15.2017
#########################################################################################################
# import and load needed scripts
import dawg
@callahantiff
callahantiff / sparse_node2vec_wrapper.py
Last active May 13, 2021 17:52
Sparse Node2Vec Wrapper for Large Networks
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# script created using: https://github.com/VHRanger/nodevectors
# import needed libraries
import argparse
import csrgraph as cg
import nodevectors
@callahantiff
callahantiff / OMOPCoverage_V1.sql
Last active September 25, 2020 15:39
OMOP2OBO - OMOP Coverage: queries sent to external OMOP shops, designed to generate coverage statistics
# PURPOSE: This query is designed to query an OMOP instance and return 6 columns.
# This query makes the assumption that the other shops would be willing to return
# some results to us, rather than calculating coverage statistics locally
WITH
condition_concepts
AS (SELECT
c.condition_concept_id AS CONCEPT_ID,
c1.concept_name AS CONCEPT_LABEL,
v.vocabulary_version AS VOCABULARY_VERSION,
@callahantiff
callahantiff / genemania_dataprocessingpipeline.ipynb
Last active July 4, 2020 21:37
GeneMania_DataProcessingPipeline.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.