Skip to content

Instantly share code, notes, and snippets.

View atrisovic's full-sized avatar

Ana Trisovic atrisovic

View GitHub Profile
 |-hospital
 | |-zip.rds
 | |-zip2.rds
 | |-SA_COPD2.Rmd
 | |-review.R
 | |-SA_MI-New.Rmd
 | |-SA_CHF2.Rmd
 | |-SA_LungCancer2.Rmd
 | |-SA_LungCancer-New.Rmd
@atrisovic
atrisovic / fasse_analytic_data_docs.md
Last active February 20, 2023 15:40
Form to document new analytic data on FASSE

Step 1: Check analytic data

Is the data you need already on FASSE? Check out the catalog here: https://nsaph.info/analytic.html#analytic-data

If it is not, see step 2.

Step 2: Fill in the form below and add it in the comments here.

The format of the form goes like this:

@atrisovic
atrisovic / whanhee.py
Last active April 17, 2022 04:32
whanhee
import pandas as pd
import numpy as np
import json
from simplejson import loads
def get_outcomes():
""" Get and return ICD codes """""
f = open('icd_codes.json')
outcomes_ = json.load(f)
{
"aki": {
"icd10": [
"N17"
],
"icd9": [
"584"
]
},
"all_kidney": {
# Before running, activate env:
# export CONDA_ENVS_PATH=/nfs/projects/n/nsaph_common/conda/envs/
# export CONDA_PKGS_PATH=/nfs/projects/n/nsaph_common/conda/pkgs/
# source activate nsaph
## Code to ID hospitalizations
library(data.table)
@atrisovic
atrisovic / sample_file_summaries.md
Last active March 9, 2022 02:50
sample file summaries in R

Get data sample

To get the data sample, we take first 25k rows and last 25k rows from the sample of 59mil rows in bash:

>> tail -n25000 /2016/mbsf_abcd_summary_res000017155_req008183_2016.dat \
        > sample_mbsf_abcd_summary_res000017155_req008183_2016.dat
>> head -n25000 /2016/mbsf_abcd_summary_res000017155_req008183_2016.dat \
        >> sample_mbsf_abcd_summary_res000017155_req008183_2016.dat
# word count:
@atrisovic
atrisovic / rewriting_history_process.md
Last active March 3, 2022 15:35
Rewriting git history for data_requests

Rewriting git history for data_requests

What happened

Beneficiery ID numbers were shared in a private GitHub repositry, in the following directories:

data_requests/request_projects/medicaid_duplicate_check_2019_09_27

Dataset stats from Dataverse

in the format DOI, release_year, mime_type

SQL DB query:

SELECT p.authority, p.identifier, f.contenttype, p.publicationdate 
FROM datafile f, dvobject o, dataset s, dvobject p 
WHERE f.id = o.id AND o.owner_id = s.id AND s.id = p.id AND s.harvestingclient_id IS NULL
from flask import Flask, redirect, url_for
from celery import Celery
from celery import Task
from subprocess import PIPE, Popen
import logging, os
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Running locally:
# run this as "python clean_code.py $PWD"
import os
import re
import sys
import glob
import codecs
import chardet
import fileinput
list_of_r_files = glob.glob("*.R")