Skip to content

Instantly share code, notes, and snippets.

View tyarkoni's full-sized avatar

Tal Yarkoni tyarkoni

View GitHub Profile
@tyarkoni
tyarkoni / silly_pie_chart.py
Created April 1, 2018 20:01
code for silly pie chart (by request)
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
rows = [
('Doing research', 50),
('Having meetings', 6),
('Begging funding agencies for money so I can keep my job', 3),
('Doing paperwork', 2),
('Reviewing papers', 2),
@tyarkoni
tyarkoni / p_hacked_effect_sizes.py
Created January 25, 2018 23:02
Illustrating the effects of p-hacking on observed effect sizes
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
%matplotlib inline
def run_study(step_size=50, max_n=200, num_tests=10, alpha=0.05):
''' Run a single study in increments of N until we either (a) achieve
significance, or (b) hit a maximum sample size. To model p-hacking, we
conduct num_tests independent tests after each increment of sampling. '''
X = np.zeros((0, num_tests))
@tyarkoni
tyarkoni / simulate_matching.py
Created April 3, 2016 15:59
Matching on unreliable variables produces residual confounding
'''
A small simulation to demonstrate that matching trials does not solve the
problem of residual confounding. For description of original problem, see
http://dx.doi.org/10.1371/journal.pone.0152719
Here we simulate a situation where we match trials from two conditions that
differ in Y on an indicator M. By hypothesis, there is no difference in Y in
the population after controlling for M. But because of measurement error,
matching on M will, on average, leave a residual mean difference in the Y's.
Raising the reliability of M (REL_M) will decrease this difference, and setting
it to 1.0 will eliminate it completely, demonstrating that matching works just
@tyarkoni
tyarkoni / t1_t2_correlation_sim.py
Last active February 23, 2016 02:15
Simulates correlation between effect sizes of original studies and replication studies
import numpy as np
import scipy.stats as ss
import matplotlib.pyplot as plt
g1_d_mu = 0.4
g1_d_sd = 0.4
prop_null = 0.3
n_subs = 20
n_studies = 400
@tyarkoni
tyarkoni / predict_from_text.py
Last active March 10, 2020 02:10
simple example predicting binary outcome from text features with sklearn
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
import pandas as pd
import numpy as np
# Grab just two categories from the 20 newsgroups dataset
categories=['sci.space', 'rec.autos']