Skip to content

Instantly share code, notes, and snippets.

View markdanese's full-sized avatar

Mark Danese markdanese

  • Outcomes Insights, Inc.
  • Calabasas, CA
  • 22:57 (UTC -07:00)
View GitHub Profile
@markdanese
markdanese / load_nis_data_2017.R
Created September 10, 2021 01:24
NIS 2017 load script
# copied from original "load_data.R" program created for 2016 data
# primary change was that there are more data fields in 2017
library(data.table)
library(magrittr)
library(readr)
library(fst)
# load core data --------------------------------------------------------------------
@markdanese
markdanese / flat_fread.R
Last active January 9, 2024 09:23
data.table fread fixed width file reader
# for reading fixed with files, which are files with no delimiter (see readr package and read_fwf())
# col_widths is a vector of column widths (e.g., c(8, 4, 2, 9))
# input file is a character string with the input file (e.g., "./data/read.txt")
# on 300 MB file with 143 columns timings on 2018 Macbook pro were as follows:
# read_fwf from readr package: 10.8 sec
# non-parallel use of gawk: 10.5 sec
# parallel use of gawk: 4.4 sec (below function)
flat_fread <- function(col_widths, input_file){
col_spec <- paste0(widths, collapse = " ")
@markdanese
markdanese / nis2016_hospital_read.R
Last active April 17, 2024 18:15
National Inpatient Sample (NIS) read program
# this loads the 2016 NIS fixed width (asc) files into R
# it also saves the result as an fst file for much faster re-reading into R
library(data.table)
library(readr)
library(fst)
# load core data --------------------------------------------------------------------
nis_specs <- fread("./docs/nis_specs_core.csv")
@markdanese
markdanese / termplot_coxph.R
Last active December 8, 2017 19:13
Plot spline based coefficients from a coxph model from the survival package in R
# based on https://cran.r-project.org/web/packages/survival/vignettes/splines.pdf from Terry Therneau
# start with termplot without the plot to return results for all coefficients in the model
# y is the object in which the coxph model results have been saved
d <- termplot(y, se = TRUE, plot = FALSE)
# takes the termplot object (tp_obj), a specific variable name from the model as a string (var_name), and outputs a plot
@markdanese
markdanese / adjusted_survival.R
Last active June 15, 2017 01:36
Simple approach to generating adjusted survival curves
# load libraries --------------------------------------------------------------------
library(survival)
library(data.table)
library(magrittr)
library(ggplot2)
options(stringsAsFactors = FALSE, scipen = 10)
@markdanese
markdanese / feather_test.R
Last active April 22, 2016 11:35
A test of the new feather package in R using Medicare Part D drug reimbursement data
# load libraries --------------------------------------------------------------------
library(data.table)
library(feather)
# US Part D Drug prices 2013: 500 MB zip, 2.9 GB uncompressed -----------------------
pde_link <- "http://download.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/PartD_Prescriber_PUF_NPI_DRUG_13.zip"
tf <- tempfile()
@markdanese
markdanese / download_synpuf.R
Last active January 14, 2021 04:32
An R script to download the Medicare SynPUF (synthetic public use files)
# main web page: "https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/DESample01.html"
# list of files to be downloaded for each 1/20 of the data
dl_list <-
c(
"https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/Downloads/DE1_0_2008_Beneficiary_Summary_File_Samplezzz.zip",
"http://downloads.cms.gov/files/DE1_0_2008_to_2010_Carrier_Claims_SamplezzzA.zip",
"http://downloads.cms.gov/files/DE1_0_2008_to_2010_Carrier_Claims_SamplezzzB.zip",
"https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/Downloads/DE1_0_2008_to_2010_Inpatient_Claims_Samplezzz.zip",
"https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/Downloads/DE1_0_2008_to_2010_Outpatient_Claims_Samplezzz.zip",
"http://downloads.cms.gov/files/DE1_0_2008_to_2010_Prescription_Drug_Events_Samplezzz.zip",
@markdanese
markdanese / part_d.R
Last active October 8, 2015 04:20
script to read in Medicare Part D Prescriber data for 2013
# ---------- US Part D Drug prices 2013 ---------- #
library(data.table)
library(magrittr)
# data from http://download.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/PartD_Prescriber_PUF_NPI_DRUG_13.zip
# 500 MB ZIP file download, 2.9 GB uncompressed
pde <- "http://download.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/PartD_Prescriber_PUF_NPI_DRUG_13.zip"
tf <- tempfile()
download.file(pde, tf)
@markdanese
markdanese / get_nhanes.R
Last active May 2, 2023 06:42
Scrape NHANES website and generate listing of all data (.xpt) and documentation (.htm) files
library(magrittr)
library(rvest)
library(xml2)
get_nhanes_listing <- function(){
nhanes_url <- "http://wwwn.cdc.gov/Nchs/Nhanes/Search/DataPage.aspx"
tbl <- xml2::read_html(nhanes_url)
table_text <-
rvest::html_table(tbl) %>%
data.frame(stringsAsFactors = FALSE) # just gets table, not hyperlinks in table
names(table_text) <- gsub("\\.", "_", names(table_text)) %>% tolower()