Skip to content

Instantly share code, notes, and snippets.

@eddjberry
eddjberry / r_vs_py.R
Created March 15, 2022 16:04
R versus numpy when applying functions to rows and columns
# R -------------------------
x = matrix(c(1,2,0, 4,3,7), ncol = 3, byrow = T)
x
# [,1] [,2] [,3]
# [1,] 1 2 0
# [2,] 4 3 7
# row means (NB: R indexes from 1)
> apply(x, MARGIN = 1, mean)
# 1.000000 4.666667
\documentclass{article}
\usepackage{pdfpages}
\begin{document}
\includepdf[pages={30, 31, 32, 37}]{/path/to/file}
\end{document}
import timeit
import numpy as np
from faker import Faker
# create the faker object
fake = Faker()
# np.random_choice function
def np_choice(N=1000):
np.random.choice(N+1, N, replace = False)
import pandas as pd
from faker import Faker
# set the seed
Faker.seed(10)
# set the locale to GB
fake = Faker("en_GB")
# how many customers to fake
@eddjberry
eddjberry / shap_dependence_plot_grid.py
Last active November 23, 2022 07:32
Plot a grid of shap.dependence_plots
# Dependencies ----------------------
import math
import shap
import matplotlib.pyplot as plt
# shap_dependence_plot_grid ---------
def shap_dependence_plot_grid(cols,
shap_values,
X,
interaction_index = None,
@eddjberry
eddjberry / partial_dependence_data.py
Created July 17, 2020 14:42
Generate the data required for a partial dependence plot from a PySpark model
def partial_dependency_data(df, model, col, values, sample_fraction = 0.1):
# empty list for predictions
avg_predictions = list()
# take a sample of the data to use
df_sample = df.sample(fraction = sample_fraction)
# loop through the values
for val in values:
@eddjberry
eddjberry / knit_dir.R
Last active July 27, 2020 12:50
Knit a directory of files from the command line
#!/usr/bin/env Rscript
# to run from command line:
## chmod +x knit_dir.R
## ./knit_dir.R <dir-name>
# from https://stackoverflow.com/a/49950761
# to avoid conflicts between packages
# breaking things
clean_search <- function() {
@eddjberry
eddjberry / plot_by_group.R
Last active November 8, 2019 14:21
Create a plot of percentages for some groups with lines for different sub-groups. E.g. regions along the x-axis with lines for the percentage of the population in different age-bands
#=================================================
# geom_line() + geom_ribbon()
#=================================================
# plots by group
plot_by_group <- function(df, x, colour) {
# create the summary data using # group_prop()
df_summary <- df %>%
dplyr::filter(!is.na({{ colour }})) %>%
group_prop({{ x }}, {{ colour }})
@eddjberry
eddjberry / sim_ab_tests.md
Last active October 18, 2019 15:42
Some functions to simulate simple A/B Tests making use of data.table. (The inline code will work when copied into an Rmd file)
title author date
Simulating A/B Tests
Ed Berry
04/10/2019
library(broom)
library(janitor)
library(data.table)
@eddjberry
eddjberry / prop_test_power_curves.R
Created July 12, 2019 15:40
Power curves for a prop.test created using pwr & ggplot2
#========================================================#
# Setup
#========================================================#
library(dplyr)
library(ggplot2)
library(here)
library(pwr)
library(scales)
library(stringr)