Skip to content

Instantly share code, notes, and snippets.

Moritz Zajonz enigmoe

Block or report user

Report or block enigmoe

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@richpauloo
richpauloo / README.md
Last active Dec 24, 2019
Cumulative Variable Importance for Random Forest Models
View README.md

Cumulative Variable Importance for Random Forest (RF) 🌲🌳 Models

Motivation

What does an interpretable RF visualization look like? Out-of-the-box 📦 RF implementations in R and Python compute variable importance over all trees, but how do we get there?

In other words, what would a cumulative variable importance for a RF look like?

Approach

View federer-ATP-100.R
# Load the packages we’re going to be using:
# Alongside the usual stuff like tidyverse and magrittr, we’ll be using rvest for some web-scraping, jsonline to parse some JSON, and extrafont to load some nice custom fonts
needs(tidyverse, magrittr, rvest, jsonlite, extrafont)
# Before we go on, two things to note:
# First, on web scraping:
# You should always check the terms of the site you are extracting data from, to make sure scraping (often referred to as `crawling`) is not prohibited. One way to do this is to visit the website’s `robots.txt` page, and ensure that a) there is nothing explicitly stating that crawlers are not permitted, and b) ideally, the site simply states that all user agents are permitted (indicated by a line saying `User-Agect: *`). Both of those are the case for our use-case today (see https://www.ultimatetennisstatistics.com/robots.txt).
# And second, about those custom fonts:
View urbanisation_mountains.R
# Prepare world data
# First up, we need to load the built-up area data that we’re going to be plotting. We download this from the European Commission’s Global Human Settlement Data portal [https://ghsl.jrc.ec.europa.eu/datasets.php] — specifically using the links from this page [http://cidportal.jrc.ec.europa.eu/ftp/jrc-opendata/GHSL/GHS_BUILT_LDSMT_GLOBE_R2015B/]. We want the 250m-resolution rasters for 1975 and 2015 (GHS_BUILT_LDS1975_GLOBE_R2016A_54009_250 and GHS_BUILT_LDS2014_GLOBE_R2016A_54009_250).
# Once you’ve downloaded these (they’re BIG, so might take a little while...), we can save ourselves a lot of hassle later on by re-projecting them into the same co-ordinate space as the other data we’re going to be using. Specifically we want to change their units from metres to lat/lon. We do this by:
# 1) Unzipping the archive, and then
# 2) Running the following script on the command-line:
# gdalwarp -t_srs EPSG:4326 -tr 0.01 0.01 path/to/your/built-up-area.tif path/to/your/built-up-area_reprojected.
View populationCurves.R
# Data is the UN's Medium-variant population projections, available at https://population.un.org/wpp/
data %>%
filter(Sex != "Both" & A3 %in% c("GBR", "RUS", "IND", "CHN", "RWA", "GRC") & Year %in% 2018:2060) %>%
as.tibble %>%
mutate(
group = paste0(Year, Sex), AgeGrp = as.numeric(AgeGrp),
Location = Location %>% gsub("n Federation","",.)
) %>%
ggplot(aes(AgeGrp, Value, col=Sex, group=group)) +
@vinayak-mehta
vinayak-mehta / disease_outbreaks_camelot.ipynb
Last active May 20, 2020
A jupyter notebook showing how Camelot can be used to extract tables from PDFs scraped from the IDSP website.
View disease_outbreaks_camelot.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
View sparkbar.R
# Takes an ordered vector of numeric values and returns a small bar chart made
# out of Unicode block elements. Works well inside dplyr mutate() or summarise()
# calls on grouped data frames.
sparkbar <- function(values) {
span <- max(values) - min(values)
if(span > 0 & !is.na(span)) {
steps <- round(values / (span / 7))
blocks <- c('', '', '', '', '', '', '', '')
paste(sapply(steps - (min(steps) - 1), function(i) blocks[i]), collapse = '')
@tjukanovt
tjukanovt / twitterbot.py
Created Jul 2, 2018
A super simple Twitter bot application posting random csv content every 2 hours
View twitterbot.py
import tweepy
import random
import pandas as pd
import time
#get your codes from https://apps.twitter.com/
consumer_key = 'your_code_here'
consumer_secret = 'your_code_here'
access_token = 'your_code_here'
access_token_secret = 'your_code_here'
View animate_labels.R
library(ggplot2) # requires 2.3.0
library(purrr)
make_plot <- function(frame) {
ggplot(mtcars, aes(mpg, hp, color = factor(cyl))) +
geom_point() +
scale_color_brewer(
palette = 2, type = "qual", name = "cyl",
guide = guide_legend(
direction = "horizontal",
@dylanmckay
dylanmckay / facebook-contact-info-summary.rb
Last active May 12, 2020
A Ruby script for collecting phone record statistics from a Facebook user data dump
View facebook-contact-info-summary.rb
#! /usr/bin/env ruby
# NOTE: Requires Ruby 2.1 or greater.
# This script can be used to parse and dump the information from
# the 'html/contact_info.htm' file in a Facebook user data ZIP download.
#
# It prints all cell phone call + SMS message + MMS records, plus a summary of each.
#
# It also dumps all of the records into CSV files inside a 'CSV' folder, that is created
@TheMapSmith
TheMapSmith / flights.js
Last active Aug 16, 2018
Fetching flight info
View flights.js
var fs = require('fs');
var request = require('request-promise');
var moment = require('moment')
// Globals
global.timestamp = moment().unix()
global.allPlaybacks = [];
global.geojson = {};
global.geojson['type'] = 'FeatureCollection';
global.geojson['features'] = [];
You can’t perform that action at this time.