Skip to content

Instantly share code, notes, and snippets.

View mathzero's full-sized avatar

Matt Whitaker mathzero

  • Imperial College London
View GitHub Profile
@mathzero
mathzero / drop_na_search.R
Last active January 7, 2024 17:54
Search algorithm to drop NAs from data set with minimum data loss
# Main function to find the 'optimal' combination of rows and columns to drop in order to
# maximise the number of remaining data points in a data set
# Search space rapidly becomes enormous as data size grows, so this random search solution will
# only ever be a rough approximation of the optimal data set.
optimiseDataset <- function(data, max_iterations = 10000) {
best_solution <- data
dims=dim(data[complete.cases(data),])
@mathzero
mathzero / crossref_doi_prefix_search.R
Last active February 5, 2023 22:27
Query CrossRef through rcrossref package, using DOI prefix
# install.packages("pubmedR")
# install.packages("rcrossref")
# devtools::install_github("ropensci/rAltmetric")
library(pubmedR)
library(rcrossref)
library(rAltmetric)
pacman::p_load(curl,readr, tidyverse,purrr) # load required packages
@mathzero
mathzero / pubmed_lit_search_and_metadata.R
Last active November 6, 2023 04:58
This script runs a literature search on PubMed programatically, then pulls some article metadata on citations and altmetrics
install.packages("pubmedR")
install.packages("rcrossref")
devtools::install_github("ropensci/rAltmetric")
install.packages("tidyverse")
install.packages("janitor")
library(pubmedR)
library(rcrossref)
library(tidyverse)
library(janitor)
@mathzero
mathzero / altmetrics_updated.R
Last active October 4, 2022 18:33
Hacky fix for rAltmetrics function
altmetrics_new <-
function(doi = NULL,
apikey = NULL,
...) {
base_url <- "https://api.altmetric.com/v1/"
args <- list(key = apikey)
request <-
httr::GET(paste0(base_url, "doi/",doi))
if(httr::status_code(request) == 404) {
@mathzero
mathzero / simulate_high_dimensional_data.R
Last active May 10, 2021 17:41
Generate a data frame with multiple normally distributed correlated variables in X and simulate X-y signals
library(ggplot2)
library(dplyr)
library(tidyr)
library(faux)
library(randomForest)
# Set parameters ----------------------------------------------------------
@mathzero
mathzero / xgboost_permutation.R
Last active March 29, 2021 09:36
Permutation function to calculate variable importance in xgboost
# xgboost permutation function
#' This function takes an XGBoost model and some X and y data
#' (ideally this would be unseen holdout data but you could also use the training data)
#' and returns a data frame with an estimation of the contribution that each variable makes to the overall AUC
#' can take a long time to run with a large data set – nperm can be reduced to reduce compute time
PermuteImportXBG <- function(model, X, y, nperm = 100){
predictors=model$feature_names
@mathzero
mathzero / gist:98a10fc4d7d8b35dd7d765dfa291425e
Last active February 26, 2021 10:50
RMarkdown report automator
#' This short function takes some input from whatever R script you are working in and creates an
#' rmarkdown document and corresponding knitted HTML file on the fly
#' Basis for code is: https://stackoverflow.com/questions/60110904/how-to-generate-html-report-directly-from-r-script
my_markdown_rederer <- function(text, myList=myList) {
# The file name for the rmd doc
rmd_file_name <- "temp.Rmd"
@mathzero
mathzero / x_tab.R
Last active February 19, 2021 11:30
Three functions to create table ones / simple cross tabs in R
#### 1. Cross tab function
# Simple cross-tab function, with %s
# Creates a cross tab of categorical variables, with %s in brackets after the number
# You can specify the names of the levels within the variables if you want, or leave blank to use the existing names
# (Be sure to enter the level names in the right order if you do specify them)
crossTab <- function(dat = dfRes, rowvar, colvar, rowvar_levels = NULL,colvar_levels = NULL){
tab <- addmargins(table(pull(dat,rowvar), pull(dat,colvar))) %>% as.data.frame.matrix()
tab.prop <- round(100*prop.table(table(pull(dat,rowvar), pull(dat,colvar)),1),1) %>% as.data.frame.matrix()
@mathzero
mathzero / pheatmap_save.R
Created June 14, 2020 09:15
Save pheatmap function (png or pdf)
### slightly adapted from here to save png or pdf and specify resolution etc: https://stackoverflow.com/questions/43051525/how-to-draw-pheatmap-plot-to-screen-and-also-save-to-file
# function detects '.png' or '.pdf' in the declared filename and assigns that file type
### pheatmap save function
save_pheatmap <- function(x, filename, width=12, height=12){
stopifnot(!missing(x))
stopifnot(!missing(filename))
if(grepl(".png",filename)){
png(filename, width=width, height=height, units = "in", res=300)
@mathzero
mathzero / pheatmap_save.R
Created June 14, 2020 09:15
Save pheatmap function (png or pdf)
### slightly adapted from here to save png or pdf and specify resolution etc: https://stackoverflow.com/questions/43051525/how-to-draw-pheatmap-plot-to-screen-and-also-save-to-file
# function detects '.png' or '.pdf' in the declared filename and assigns that file type
### pheatmap save function
save_pheatmap <- function(x, filename, width=12, height=12){
stopifnot(!missing(x))
stopifnot(!missing(filename))
if(grepl(".png",filename)){
png(filename, width=width, height=height, units = "in", res=300)