Srikanth K S talegari

## logistic_regression_testing_hypotheses.md

      
              1 file
            
          
              2 forks
            
          
              0 comments
            
          
              16 stars
            
          
                adrianolszewski
                / logistic_regression_testing_hypotheses.md
            
            
              Last active
              February 18, 2024 23:38
            
              
                Logistic regression is often used for testing hypotheses, replacing a variety of common classic tests
              
          
    Despite the widespread and nonsensical claim, that "logistic regression is not a regression", it constitutes one of the key regression and hypothesis testing tools used in the experimental research (like clinical trials).
Let me show you how the logistic regression (with a few extensions) can be used to test hypotheses about fractions (%) of successes, repacling the classic "test for proportions".
Namely, it can replicate the results of:

the Wald's (normal approximation) z test for 2 proportions with non-pooled standard errors (common in clinical trials) via LS-means on the prediction scale or AME (average marginal effect)
the Rao's score (normal appr.) z test for 2 proportions with pooled standard errors (just what the prop.test() does in R)
the z test for multiple (2+) proportions
ANOVA-like (joint) test for multiple caterogical predictors (n-way ANOVA). Also (n-way) ANCOVA if you employ numerical covariates.
[the **Cochran-Mantel-Haenszel


## postgresql_is_enough.md

      
              1 file
            
          
              186 forks
            
          
              40 comments
            
          
              1836 stars
            
          
                cpursley
                / postgresql_is_enough.md
            
            
              Last active
              July 17, 2024 00:03
            
              
                Postgres is Enough 
              
          
    PostgreSQL is Enough


Simplify: move code into database functions
Just Use Postgres for Everything
PostgreSQL is the worlds’ best database
Postgres is eating the database world
Hacker News discussion

Background and Cron Jobs


## grow_vector.R
# The code below demonstrates that in R, growing a vector in a loop can be fast,
# as long as there is only reference to the object. When there's only one
# reference to the vector, R grows it in place (in most cases). However, if
# there are other references to the object, R must make a copy the object
# instead of growing it in place, leading to slower performance.

# =========================================================================
# Timing tests
# =========================================================================

## linked_hover.R
library(tidycensus)
library(ggiraph)
library(tidyverse)
library(patchwork)

vt_income <- get_acs(
  geography = "county",
  variables = "B19013_001",
  state = "VT",
  year = 2019,

## deploy_rstudio_with_dataproc.md

      
              3 files
            
          
              0 forks
            
          
              0 comments
            
          
              2 stars
            
          
                allanbatista
                / deploy_rstudio_with_dataproc.md
            
            
              Last active
              March 27, 2023 09:11
            
              
                Deploy RStudio with Dataproc
              
          
    Create cluster

$ gcloud beta dataproc clusters create [CLUSTER-NAME] \
                              --enable-component-gateway \
                              --bucket bucket-name \
                              --region us-central1 \
                              --subnet default \
                              --zone us-central1-a \
                              --master-machine-type n1-standard-4 \

--master-boot-disk-size 500 \

  
## explainableDL.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              2 stars
            
          
                sooheang
                / explainableDL.md
            
            
              Last active
              July 20, 2021 17:42
            
              
                Explainable Deep Learning
              
          
    Explainable Deep Learning

Overview of Explainable Deep Learning

Three major research directions in explainable deep learning: understanding, debugging, and refinement/steering
Model understanding

aims to explain the rationale behind model predictions and the inner workings of deep learning models, and it attempts to make these complex models at least partly understanding

Perturbation experiments (CVPR2014): Large Convolutional Network models have recently demonstrated impressive classification performance on the ImageNet benchmark. However there is no clear understanding of why they perform so well, or how they might be improved. In this paper we address both issues. We introduce a novel visualization technique that gives insight into the function of intermediate feature layers and the operation of the classifier. We also perform an ablation study to discover the performance contribution from different model layers. This enables us to find model archite


## r-to-python-data-wrangling-basics.md

      
              1 file
            
          
              102 forks
            
          
              38 comments
            
          
              403 stars
            
          
                conormm
                / r-to-python-data-wrangling-basics.md
            
            
              Last active
              June 26, 2024 07:56
            
              
                R to Python: Data wrangling with dplyr and pandas
              
          
    R to python data wrangling snippets

The dplyr package in R makes data wrangling significantly easier.
The beauty of dplyr is that, by design, the options available are limited.
Specifically, a set of key verbs form the core of the package.
Using these verbs you can solve a wide range of data problems effectively in a shorter timeframe.
Whilse transitioning to Python I have greatly missed the ease with which I can think through and solve problems using dplyr in R.
The purpose of this document is to demonstrate how to execute the key dplyr verbs when manipulating data using Python (with the pandas package).
dplyr is organised around six key verbs:

  
## batch2sqlite.R
library(dplyr)
library(readr)
library(DBI)
library(RSQLite)

read.csv2sqlite <- function(csv_file, sqlite_file, table_name, batch_size = 10000) {

  ## establish a connection to the database
  condb <- dbConnect(SQLite(), sqlite_file)

## multi_dispatch.R

# ---- Multiple dispatch functions -----
multi_dispatch <- function(gen_name) {
  calling_env <- parent.frame()
  parent_call <- sys.call(sys.parent())
  calling_fun <- sys.function(sys.parent())

  arg1 <- eval(parent_call[[2]], calling_env)
  arg2 <- eval(parent_call[[3]], calling_env)

## .gitignore
.Rproj.user
.Rhistory
.RData
*.Rproj
*.html
	# The code below demonstrates that in R, growing a vector in a loop can be fast,
	# as long as there is only reference to the object. When there's only one
	# reference to the vector, R grows it in place (in most cases). However, if
	# there are other references to the object, R must make a copy the object
	# instead of growing it in place, leading to slower performance.

	# =========================================================================
	# Timing tests
	# =========================================================================
	library(tidycensus)
	library(ggiraph)
	library(tidyverse)
	library(patchwork)

	vt_income <- get_acs(
	geography = "county",
	variables = "B19013_001",
	state = "VT",
	year = 2019,
	library(dplyr)
	library(readr)
	library(DBI)
	library(RSQLite)

	read.csv2sqlite <- function(csv_file, sqlite_file, table_name, batch_size = 10000) {

	## establish a connection to the database
	condb <- dbConnect(SQLite(), sqlite_file)

	# ---- Multiple dispatch functions -----
	multi_dispatch <- function(gen_name) {
	calling_env <- parent.frame()
	parent_call <- sys.call(sys.parent())
	calling_fun <- sys.function(sys.parent())

	arg1 <- eval(parent_call[[2]], calling_env)
	arg2 <- eval(parent_call[[3]], calling_env)