Skip to content

Instantly share code, notes, and snippets.

@eddjberry
eddjberry / plot_by_group.R
Last active November 8, 2019 14:21
Create a plot of percentages for some groups with lines for different sub-groups. E.g. regions along the x-axis with lines for the percentage of the population in different age-bands
#=================================================
# geom_line() + geom_ribbon()
#=================================================
# plots by group
plot_by_group <- function(df, x, colour) {
# create the summary data using # group_prop()
df_summary <- df %>%
dplyr::filter(!is.na({{ colour }})) %>%
group_prop({{ x }}, {{ colour }})
@eddjberry
eddjberry / group_prop.R
Last active April 27, 2020 11:07
Get counts and proportions by group(s)
group_prop <- function(df, ...) {
# enquo the dots
vars <- enquos(...)
# count then calculate
# proportions
df_count <- df %>%
count(!!!vars)
if (length(vars) > 1) {
@eddjberry
eddjberry / partial_dependence_data.py
Created July 17, 2020 14:42
Generate the data required for a partial dependence plot from a PySpark model
def partial_dependency_data(df, model, col, values, sample_fraction = 0.1):
# empty list for predictions
avg_predictions = list()
# take a sample of the data to use
df_sample = df.sample(fraction = sample_fraction)
# loop through the values
for val in values:
@eddjberry
eddjberry / knit_dir.R
Last active July 27, 2020 12:50
Knit a directory of files from the command line
#!/usr/bin/env Rscript
# to run from command line:
## chmod +x knit_dir.R
## ./knit_dir.R <dir-name>
# from https://stackoverflow.com/a/49950761
# to avoid conflicts between packages
# breaking things
clean_search <- function() {
import pandas as pd
from faker import Faker
# set the seed
Faker.seed(10)
# set the locale to GB
fake = Faker("en_GB")
# how many customers to fake
import timeit
import numpy as np
from faker import Faker
# create the faker object
fake = Faker()
# np.random_choice function
def np_choice(N=1000):
np.random.choice(N+1, N, replace = False)
\documentclass{article}
\usepackage{pdfpages}
\begin{document}
\includepdf[pages={30, 31, 32, 37}]{/path/to/file}
\end{document}
@eddjberry
eddjberry / r_vs_py.R
Created March 15, 2022 16:04
R versus numpy when applying functions to rows and columns
# R -------------------------
x = matrix(c(1,2,0, 4,3,7), ncol = 3, byrow = T)
x
# [,1] [,2] [,3]
# [1,] 1 2 0
# [2,] 4 3 7
# row means (NB: R indexes from 1)
> apply(x, MARGIN = 1, mean)
# 1.000000 4.666667
@eddjberry
eddjberry / shap_dependence_plot_grid.py
Last active November 23, 2022 07:32
Plot a grid of shap.dependence_plots
# Dependencies ----------------------
import math
import shap
import matplotlib.pyplot as plt
# shap_dependence_plot_grid ---------
def shap_dependence_plot_grid(cols,
shap_values,
X,
interaction_index = None,