Skip to content

Instantly share code, notes, and snippets.

View gdmcdonald's full-sized avatar

Gordon McDonald gdmcdonald

View GitHub Profile
@gdmcdonald
gdmcdonald / create_6
Last active October 19, 2022 05:53
Create six character alphanumeric random code in excel (lowercase only)
=CONCAT(
CHAR(48+MOD(RANDBETWEEN(49,84),75)),
CHAR(48+MOD(RANDBETWEEN(49,84),75)),
CHAR(48+MOD(RANDBETWEEN(49,84),75)),
CHAR(48+MOD(RANDBETWEEN(49,84),75)),
CHAR(48+MOD(RANDBETWEEN(49,84),75)),
CHAR(48+MOD(RANDBETWEEN(49,84),75))
)
@gdmcdonald
gdmcdonald / eval_classifier.R
Last active March 9, 2020 13:48
Function to evaluate caret classifiers with ROC, AUC and confusion matrix
eval_classifier <- function(trained_model, test_data) {
outcome_var <- as.character(
trained_model$terms[[2]]
)
y_test <- test_data[[outcome_var]]
# make predictions and probailities on the test set
y_pred <- predict(trained_model, test_data, type = "raw")
@gdmcdonald
gdmcdonald / normalised_mutual_information.R
Created April 4, 2018 07:03
An example of using normalised mutual information in R
#load the mutual information library
library(mpmi)
#define normalised continuous mutual information, bias corrected
ncmi<-function(cts,...){
MIunnorm<-cmi(cts,...)
MIbcmiself<-diag(MIunnorm$bcmi)
MIbcminorm<-outer(MIbcmiself,MIbcmiself,FUN = "*")
MInormed<-MIunnorm$bcmi / sqrt(MIbcminorm)
colnames(MInormed)<-colnames(cts)
@gdmcdonald
gdmcdonald / eff_fuzzy_match.R
Last active February 1, 2022 13:01
Efficient fuzzy match of two data frames by one common string column in R, outputing a list of the matching and non-matching rows
#Efficient fuzzy match of two data frames by one common column
library(dplyr)
library(fuzzyjoin)
library(stringdist)
eff_fuzzy_match<-function(data_frame_A,
data_frame_B,
by_what,
choose_p = 0.1,
choose_max_dist = 0.4,
@gdmcdonald
gdmcdonald / Recursive_join.R
Created September 21, 2017 01:48
A recursive join function to match the left data frame with as many columns as possible (in order of importance) as the right data frame.
#Make Example Data
df_a<-data.frame(A = c(1:9,11), B = letters[1:10], C = sample(1:4,10,replace = T))
df_b<-data.frame(A = c(1:10,1:10), B = letters[c(1:5,10,9,8,7,5,6:15)], C = sample(1:4,20,replace = T))
order_of_importance<-c("A"="A","B"="B")
#Define Recursive Join Function
recursive_join<-function(left_df,right_df,variable_order){
@gdmcdonald
gdmcdonald / matchTypos.R
Last active September 4, 2017 00:30
Finding and matching typos in strings in a dataframe in R. See the question at https://stackoverflow.com/questions/45990947/how-to-find-a-typo-in-a-data-frame-and-replace-it/
library(stringdist)
library(dplyr)
#Example Data Frame to find and correct typos in
my_df<-data.frame(BIRTH = c(1,1,2,3,1,5,3,3,1),
NAME = c("Luke","Luke","Leia","Han","Ben","Lando","Han","Ham","Luke"),
SURNAME = c("Skywalker","Skywalker","Organa","Solo","Solo","Calrissian","Solo","Solo","Wkywalker"),
random_value = c(1,2,3,7,1,3,4,4,9))
#Concatenate the birthday and name columns
@gdmcdonald
gdmcdonald / AlignDateXAxes.R
Last active August 21, 2017 01:55
How to align the x-axis of two different time series plots in R, adapted from https://gist.github.com/boboppie/f115b2e1f0004a9d0623
library(ggplot2)
library(grid)
library(lubridate)
# Create some data to play with. Two time series with offset timestamp.
df1 <- data.frame(DateTime = ymd("2010-07-01") + c(0:8760) * hours(2), series1 = rnorm(8761))
df2 <- data.frame(DateTime = ymd("2011-07-01") + c(0:8760) * hours(2), series1 = rnorm(8761))
# Create the two plots.
plotme <- function(inputdf,titletext,whatcolor){