Skip to content

Instantly share code, notes, and snippets.

View enigmoe's full-sized avatar

Moritz Zajonz enigmoe

View GitHub Profile
@richpauloo
richpauloo / README.md
Last active December 24, 2019 01:43
Cumulative Variable Importance for Random Forest Models

Cumulative Variable Importance for Random Forest (RF) 🌲🌳 Models

Motivation

What does an interpretable RF visualization look like? Out-of-the-box 📦 RF implementations in R and Python compute variable importance over all trees, but how do we get there?

In other words, what would a cumulative variable importance for a RF look like?

Approach

# Load the packages we’re going to be using:
# Alongside the usual stuff like tidyverse and magrittr, we’ll be using rvest for some web-scraping, jsonline to parse some JSON, and extrafont to load some nice custom fonts
needs(tidyverse, magrittr, rvest, jsonlite, extrafont)
# Before we go on, two things to note:
# First, on web scraping:
# You should always check the terms of the site you are extracting data from, to make sure scraping (often referred to as `crawling`) is not prohibited. One way to do this is to visit the website’s `robots.txt` page, and ensure that a) there is nothing explicitly stating that crawlers are not permitted, and b) ideally, the site simply states that all user agents are permitted (indicated by a line saying `User-Agect: *`). Both of those are the case for our use-case today (see https://www.ultimatetennisstatistics.com/robots.txt).
# And second, about those custom fonts:
# Prepare world data
# First up, we need to load the built-up area data that we’re going to be plotting. We download this from the European Commission’s Global Human Settlement Data portal [https://ghsl.jrc.ec.europa.eu/datasets.php] — specifically using the links from this page [http://cidportal.jrc.ec.europa.eu/ftp/jrc-opendata/GHSL/GHS_BUILT_LDSMT_GLOBE_R2015B/]. We want the 250m-resolution rasters for 1975 and 2015 (GHS_BUILT_LDS1975_GLOBE_R2016A_54009_250 and GHS_BUILT_LDS2014_GLOBE_R2016A_54009_250).
# Once you’ve downloaded these (they’re BIG, so might take a little while...), we can save ourselves a lot of hassle later on by re-projecting them into the same co-ordinate space as the other data we’re going to be using. Specifically we want to change their units from metres to lat/lon. We do this by:
# 1) Unzipping the archive, and then
# 2) Running the following script on the command-line:
# gdalwarp -t_srs EPSG:4326 -tr 0.01 0.01 path/to/your/built-up-area.tif path/to/your/built-up-area_reprojected.
# Data is the UN's Medium-variant population projections, available at https://population.un.org/wpp/
data %>%
filter(Sex != "Both" & A3 %in% c("GBR", "RUS", "IND", "CHN", "RWA", "GRC") & Year %in% 2018:2060) %>%
as.tibble %>%
mutate(
group = paste0(Year, Sex), AgeGrp = as.numeric(AgeGrp),
Location = Location %>% gsub("n Federation","",.)
) %>%
ggplot(aes(AgeGrp, Value, col=Sex, group=group)) +
@vinayak-mehta
vinayak-mehta / disease_outbreaks_camelot.ipynb
Last active November 5, 2023 18:54
A jupyter notebook showing how Camelot can be used to extract tables from PDFs scraped from the IDSP website.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
# Takes an ordered vector of numeric values and returns a small bar chart made
# out of Unicode block elements. Works well inside dplyr mutate() or summarise()
# calls on grouped data frames.
sparkbar <- function(values) {
span <- max(values) - min(values)
if(span > 0 & !is.na(span)) {
steps <- round(values / (span / 7))
blocks <- c('▁', '▂', '▃', '▄', '▅', '▆', '▇', '█')
paste(sapply(steps - (min(steps) - 1), function(i) blocks[i]), collapse = '')
@tjukanovt
tjukanovt / twitterbot.py
Created July 2, 2018 18:06
A super simple Twitter bot application posting random csv content every 2 hours
import tweepy
import random
import pandas as pd
import time
#get your codes from https://apps.twitter.com/
consumer_key = 'your_code_here'
consumer_secret = 'your_code_here'
access_token = 'your_code_here'
access_token_secret = 'your_code_here'
@clauswilke
clauswilke / animate_labels.R
Created May 15, 2018 14:25
rotate plot labels
library(ggplot2) # requires 2.3.0
library(purrr)
make_plot <- function(frame) {
ggplot(mtcars, aes(mpg, hp, color = factor(cyl))) +
geom_point() +
scale_color_brewer(
palette = 2, type = "qual", name = "cyl",
guide = guide_legend(
direction = "horizontal",
@dylanmckay
dylanmckay / facebook-contact-info-summary.rb
Last active March 12, 2024 22:46
A Ruby script for collecting phone record statistics from a Facebook user data dump
#! /usr/bin/env ruby
# NOTE: Requires Ruby 2.1 or greater.
# This script can be used to parse and dump the information from
# the 'html/contact_info.htm' file in a Facebook user data ZIP download.
#
# It prints all cell phone call + SMS message + MMS records, plus a summary of each.
#
# It also dumps all of the records into CSV files inside a 'CSV' folder, that is created
@TheMapSmith
TheMapSmith / flights.js
Last active March 17, 2024 15:51
Fetching flight info
var fs = require('fs');
var request = require('request-promise');
var moment = require('moment')
// Globals
global.timestamp = moment().unix()
global.allPlaybacks = [];
global.geojson = {};
global.geojson['type'] = 'FeatureCollection';
global.geojson['features'] = [];