Skip to content

Instantly share code, notes, and snippets.

View talegari's full-sized avatar

Srikanth K S talegari

View GitHub Profile
@adrianolszewski
adrianolszewski / logistic_regression_testing_hypotheses.md
Last active February 18, 2024 23:38
Logistic regression is often used for testing hypotheses, replacing a variety of common classic tests

Despite the widespread and nonsensical claim, that "logistic regression is not a regression", it constitutes one of the key regression and hypothesis testing tools used in the experimental research (like clinical trials).

Let me show you how the logistic regression (with a few extensions) can be used to test hypotheses about fractions (%) of successes, repacling the classic "test for proportions". Namely, it can replicate the results of:

  1. the Wald's (normal approximation) z test for 2 proportions with non-pooled standard errors (common in clinical trials) via LS-means on the prediction scale or AME (average marginal effect)
  2. the Rao's score (normal appr.) z test for 2 proportions with pooled standard errors (just what the prop.test() does in R)
  3. the z test for multiple (2+) proportions
  4. ANOVA-like (joint) test for multiple caterogical predictors (n-way ANOVA). Also (n-way) ANCOVA if you employ numerical covariates.
  5. [the **Cochran-Mantel-Haenszel
@wch
wch / grow_vector.R
Last active December 24, 2023 17:33
Tests with growing vectors in a loop in R
# The code below demonstrates that in R, growing a vector in a loop can be fast,
# as long as there is only reference to the object. When there's only one
# reference to the vector, R grows it in place (in most cases). However, if
# there are other references to the object, R must make a copy the object
# instead of growing it in place, leading to slower performance.
# =========================================================================
# Timing tests
# =========================================================================
library(tidycensus)
library(ggiraph)
library(tidyverse)
library(patchwork)
vt_income <- get_acs(
geography = "county",
variables = "B19013_001",
state = "VT",
year = 2019,
@allanbatista
allanbatista / deploy_rstudio_with_dataproc.md
Last active March 27, 2023 09:11
Deploy RStudio with Dataproc

Create cluster

$ gcloud beta dataproc clusters create [CLUSTER-NAME] \
                              --enable-component-gateway \
                              --bucket bucket-name \
                              --region us-central1 \
                              --subnet default \
                              --zone us-central1-a \
                              --master-machine-type n1-standard-4 \

--master-boot-disk-size 500 \

@sooheang
sooheang / explainableDL.md
Last active July 20, 2021 17:42
Explainable Deep Learning

Explainable Deep Learning

Overview of Explainable Deep Learning

Three major research directions in explainable deep learning: understanding, debugging, and refinement/steering

Model understanding

aims to explain the rationale behind model predictions and the inner workings of deep learning models, and it attempts to make these complex models at least partly understanding

  • Perturbation experiments (CVPR2014): Large Convolutional Network models have recently demonstrated impressive classification performance on the ImageNet benchmark. However there is no clear understanding of why they perform so well, or how they might be improved. In this paper we address both issues. We introduce a novel visualization technique that gives insight into the function of intermediate feature layers and the operation of the classifier. We also perform an ablation study to discover the performance contribution from different model layers. This enables us to find model archite
@conormm
conormm / r-to-python-data-wrangling-basics.md
Last active June 26, 2024 07:56
R to Python: Data wrangling with dplyr and pandas

R to python data wrangling snippets

The dplyr package in R makes data wrangling significantly easier. The beauty of dplyr is that, by design, the options available are limited. Specifically, a set of key verbs form the core of the package. Using these verbs you can solve a wide range of data problems effectively in a shorter timeframe. Whilse transitioning to Python I have greatly missed the ease with which I can think through and solve problems using dplyr in R. The purpose of this document is to demonstrate how to execute the key dplyr verbs when manipulating data using Python (with the pandas package).

dplyr is organised around six key verbs:

@vnijs
vnijs / batch2sqlite.R
Last active August 30, 2021 07:00
Reading a csv file into an sqlite database in chunks
library(dplyr)
library(readr)
library(DBI)
library(RSQLite)
read.csv2sqlite <- function(csv_file, sqlite_file, table_name, batch_size = 10000) {
## establish a connection to the database
condb <- dbConnect(SQLite(), sqlite_file)
@wch
wch / multi_dispatch.R
Last active December 24, 2023 17:20
Multiple dispatch in R without S4
# ---- Multiple dispatch functions -----
multi_dispatch <- function(gen_name) {
calling_env <- parent.frame()
parent_call <- sys.call(sys.parent())
calling_fun <- sys.function(sys.parent())
arg1 <- eval(parent_call[[2]], calling_env)
arg2 <- eval(parent_call[[3]], calling_env)
@hadley
hadley / .gitignore
Last active February 25, 2024 02:10
Benchmark different ways of reading a file
.Rproj.user
.Rhistory
.RData
*.Rproj
*.html