Skip to content

Instantly share code, notes, and snippets.

View hannesdatta's full-sized avatar
🚩
https://tilburgsciencehub.com

Hannes Datta hannesdatta

🚩
https://tilburgsciencehub.com
View GitHub Profile
@hannesdatta
hannesdatta / rolling.do
Last active November 21, 2019 11:27
Rolling window merge in Stata
/*
===================================
ROLLING WINDOW AGGREGATION IN STATA
===================================
Problem:
--------
Assume you have a data set with time/datestamps, noting
@hannesdatta
hannesdatta / proc_auxilary.R
Created December 2, 2019 13:50
Auxilary functions to report regression results in R
psignstars <- function(x) {
sapply(x, function(p) ifelse(p < .01, "***", ifelse(p < .05, "**", ifelse(p < .1, "*", " "))))
}
# Function to run regression model (#mmix, with spec formula)
regmodel <- function(formula=list(~1+I(country_class=='linc') + as.factor(category) + as.factor(brand)),
dat, model = 'lm') {
lmerctrl = lmerControl(optimizer ="Nelder_Mead", check.conv.singular="ignore")
@hannesdatta
hannesdatta / script.R
Last active December 12, 2019 13:56
Customized column names when aggregating in data.table
# PROBLEM:
# I would like to give the new column in the new DT
# the name "mean_price"; however, I cannot figure out how to do this.
# It should be possible but I don't know how.
# Here is someone with a related issue: https://stackoverflow.com/questions/12391950/select-assign-to-data-table-when-variable-names-are-stored-in-a-character-vect
# Do you know how to resolve this issue?
# EXAMPLE:
@hannesdatta
hannesdatta / proc_unitroots.R
Last active February 12, 2020 21:44
Augmented Dickey-Fuller Test / Enders procedure when the data generating process is unknown (e.g., inclusion of deterministic trend, or not)
####################################
# #
# UNIT ROOT TESTS #
# IN THE ABSENCE OF #
# KNOWLEDGE ON THE ACTUAL #
# DATA GENERATING PROCESS #
# #
# Enders 1995, #
# Applied Econometric Time Series #
# pp. 254 - 258 and #
@hannesdatta
hannesdatta / clean_artistnames.R
Created March 24, 2020 15:56
clean clear-text artist names from collaborations and secondary artists
require(stringi)
spelling_variants <- function(x, remove_collabs=F, remove_parentheses=T) {
qualifiers = c(" feat .*", " feat[.].*", " ft.*", " ft[.].*"," featuring.*"," vs[.].*"," vs.*"," versus.*"," with.*","[-].*"," / .*",
"/.*","[|].*", "[[].*[]]", "[)].*", ";.*","[+].*","[&] .*","[&].*",",.*"," and .*", " con .*", " e .*", " et .*",
" x .*")
# remove articles (a, the)
ret = gsub(" a ", "", tolower(str_trim(x)))
@hannesdatta
hannesdatta / deprecated-classify-labels.R
Last active July 23, 2020 06:05
Classifying music labels into major- and independent labels
This gist has been replaced by an R package with an updated list of labels.
Get it on GitHub: https://github.com/hannesdatta/musicMetadata
LEGACY CODE
#################################################
# #
# Classify music labels #
# into major labels (Sony, Warner, Universal), #
@hannesdatta
hannesdatta / convert-dates-in-data.table.R
Created September 27, 2020 11:53
Fast conversion of `character` data columns to Date using data.table
# Quick conversion of `character` date columns to Date format using data.table
# fread(..., colClasses = c(date='Date')) is slow for large data sets, especially when
# the number of unique dates is small, but the number of cross-sectional units is large.
# The intuition of this algorithm is to only convert the UNIQUE dates to dates using as.Date,
# and then merging them back to the original data.table.
library(data.table)
data.table.date <- function(dt, datecol) {
@hannesdatta
hannesdatta / compare_regression_results.R
Last active December 16, 2020 15:02
Compare regression results with stargazer, and compare coefficients using the delta method (in R)
# Code snippet to compare the outcome of different regressions
# - showing regression results in a publication-ready table
# - conducting a statistical test for differences in coefficients using the delta method
## Let's generate some data
set.seed(1234) # initialize random number generator
y= runif(1000)
x1= runif(1000)
x2= runif(1000)
x3= runif(1000)
@hannesdatta
hannesdatta / draw.R
Created March 1, 2021 12:11
Draw from four-dimensional normal distribution with specific correlation structure
#' Draw from variance-covariance matrix, given correlation rho (buggy!)
#'
#' The purpose of this function is to generate draws from
#' the variance-covariance matrix, given a specific correlation structure.
#' In particular, a four-dimensional correlation structure is drawn, in which
#' the first dimension (sales) is correlated with the remaining 3 dimensions
#' (marketing mix instruments) with *rho*, and the correlations among the 3 dimensions
#' (marketing mix instruments) is zero (or about zero).
#'
#' @param rho Correlation between the first dimension, and the remaining three dimensions
@hannesdatta
hannesdatta / install_packages.R
Created June 25, 2021 07:48
automatically install all R packages used in a project (scans all source code files)
################################
# FIND AND INSTALL R PACKAGES #
# #
# #
# Searches source code for #
# references to packages, #
# and installs all #
# uninstalled packages. #
# #
# Put this script in the #