Tiffany J. Callahan callahantiff

## PheKnowLator_Tutorial_EntitySearch.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                callahantiff
                / PheKnowLator_Tutorial_EntitySearch.md
            
            
              Last active
              October 5, 2022 20:54
            
              
                PheKnowLator Tutorial -- Entity Search
              
          
Tutorial: Entity Search

In order to run the Entity Search you will need to download and run the
To work with the Notebook.

Fork this library: https://github.com/callahantiff/PheKnowLator
If the notebook titled Entity_Search.ipynb and accompanying script entity_search.py are not in PheKnowLator/notebooks/tutorials/entity_search then download them from the links below to PheKnowLator/notebooks/tutorials/entity_search:

Entity_Search.ipynb


entity_search.py


## derpyHatDrinkingDerby.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                callahantiff
                / derpyHatDrinkingDerby.md
            
            
              Last active
              May 7, 2022 21:22
            
              
                Derpy Hat Drinking Derby
              
          
Derpy Hat Drinking Derby

Derby Master Dan's Race Rules

Everyone who plays must pick a figurine aka your "thristy jockey".
Each drink you finish advances you one spot, but consuming a drink in manner in which impresses the Derby Master can earn you up to 3 spots.
For each lap you complete, you can select a different jockey to be your riding buddy. Riding buddies must drink their own drinks in additional to each drink their buddy completes.
If you get lapped, you must take a shot of the fastest jockey's choosing.


## _README.md

      
              4 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                callahantiff
                / _README.md
            
            
              Last active
              January 28, 2023 22:30
            
              
                A simple pipeline for characterizing patient representations
              
          
    Characterizing Patient Representations


Purpose

Unlike other fields which perform comprehensive diagnostics or characterization prior to analysis, the evaluation of patient representation learning methods has largely been limited to the context of a specific downstream use case and is usually performed as part of model interpretation. While it cannot solve all of the aforementioned challenges, data-driven characterization of patient representations, independent of model development, may provide invaluable and unexpected insight and is an important first step towards understanding if these methods can be used to help automate CP development. To this end, we sought to answer the following questions:
RSQ1: What combinations of data type and sampling window create the best patient representations and does performance differ by disease group?
RSQ2: How does data-driven characterization of patient representation impact the explainability of downstream tasks like clustering?
To address these que

  
## PheKnowLator_PE_Evaluation.py
# import needed libraries
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')
import matplotlib.patches as mpatches
import numpy as np
import pandas as pd
import pickle
import random
from sklearn.manifold import TSNE

## repairing_pkt_metadata_files.py
# Script Purpose: Script was build to address https://github.com/callahantiff/PheKnowLator/issues/116

# import needed libraries
import json
import os
import re
import pandas as pd
import pickle
import shutil

## PatientSimilarity_ControlPatients.sql
-- RARE DISEASE - RANDOM PATIENTS: Query is designed to retrieve on 10,000 random patients
-- The query seraches only among patients having >9 visits
-- Lasted edited on: 05/04/2018
SELECT v.person_id, count(v.visit_occurrence_id) AS count, p_dat.cond, p_dat.cond_num
FROM CHCO_DeID_Apr2018.visit_occurrence v
RIGHT JOIN
(SELECT person_id, 'Rand' AS cond, 99 AS cond_num FROM CHCO_DeID_Apr2018.person
WHERE person_id NOT IN
(SELECT v.person_id
FROM CHCO_DeID_Apr2018.visit_occurrence v

## composite_patient_similarity.py
#########################################################################################################
# 2017 NLM Summer Medical Informatics Internship
# Purpose: queries a Google BigQuery Database and returns a vector of values for a set of patients
# version 1.1.0
# date: 08.15.2017
#########################################################################################################


# import and load needed scripts
import dawg

## sparse_node2vec_wrapper.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

# script created using: https://github.com/VHRanger/nodevectors

# import needed libraries
import argparse
import csrgraph as cg
import nodevectors

## OMOPCoverage_V1.sql
# PURPOSE: This query is designed to query an OMOP instance and return 6 columns.
# This query makes the assumption that the other shops would be willing to return
# some results to us, rather than calculating coverage statistics locally

WITH
  condition_concepts
  AS (SELECT
        c.condition_concept_id AS CONCEPT_ID,
        c1.concept_name AS CONCEPT_LABEL,
        v.vocabulary_version AS VOCABULARY_VERSION,

## genemania_dataprocessingpipeline.ipynb

      
              1 file
            
          
              0 forks
            
          
              2 comments
            
          
              0 stars
            
          
                callahantiff
                / genemania_dataprocessingpipeline.ipynb
            
            
              Last active
              July 4, 2020 21:37
            
              
                GeneMania_DataProcessingPipeline.ipynb
              
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
	# import needed libraries
	import matplotlib.pyplot as plt
	import matplotlib
	matplotlib.style.use('ggplot')
	import matplotlib.patches as mpatches
	import numpy as np
	import pandas as pd
	import pickle
	import random
	from sklearn.manifold import TSNE
	# Script Purpose: Script was build to address https://github.com/callahantiff/PheKnowLator/issues/116

	# import needed libraries
	import json
	import os
	import re
	import pandas as pd
	import pickle
	import shutil
	-- RARE DISEASE - RANDOM PATIENTS: Query is designed to retrieve on 10,000 random patients
	-- The query seraches only among patients having >9 visits
	-- Lasted edited on: 05/04/2018
	SELECT v.person_id, count(v.visit_occurrence_id) AS count, p_dat.cond, p_dat.cond_num
	FROM CHCO_DeID_Apr2018.visit_occurrence v
	RIGHT JOIN
	(SELECT person_id, 'Rand' AS cond, 99 AS cond_num FROM CHCO_DeID_Apr2018.person
	WHERE person_id NOT IN
	(SELECT v.person_id
	FROM CHCO_DeID_Apr2018.visit_occurrence v
	#########################################################################################################
	# 2017 NLM Summer Medical Informatics Internship
	# Purpose: queries a Google BigQuery Database and returns a vector of values for a set of patients
	# version 1.1.0
	# date: 08.15.2017
	#########################################################################################################


	# import and load needed scripts
	import dawg
	#!/usr/bin/env python3
	# -- coding: utf-8 --

	# script created using: https://github.com/VHRanger/nodevectors

	# import needed libraries
	import argparse
	import csrgraph as cg
	import nodevectors
	# PURPOSE: This query is designed to query an OMOP instance and return 6 columns.
	# This query makes the assumption that the other shops would be willing to return
	# some results to us, rather than calculating coverage statistics locally

	WITH
	condition_concepts
	AS (SELECT
	c.condition_concept_id AS CONCEPT_ID,
	c1.concept_name AS CONCEPT_LABEL,
	v.vocabulary_version AS VOCABULARY_VERSION,