Moritz Zajonz enigmoe

## README.md

      
              2 files
            
          
              2 forks
            
          
              2 comments
            
          
              5 stars
            
          
                richpauloo
                / README.md
            
            
              Last active
              December 24, 2019 01:43
            
              
                Cumulative Variable Importance for Random Forest Models
              
          
    Cumulative Variable Importance for Random Forest (RF) 🌲🌳 Models

Motivation

What does an interpretable RF visualization look like? Out-of-the-box 📦 RF implementations in R and Python compute variable importance over all trees, but how do we get there?
In other words, what would a cumulative variable importance for a RF look like?
Approach


## federer-ATP-100.R
# Load the packages we’re going to be using:
# Alongside the usual stuff like tidyverse and magrittr, we’ll be using rvest for some web-scraping, jsonline to parse some JSON, and extrafont to load some nice custom fonts
needs(tidyverse, magrittr, rvest, jsonlite, extrafont)

# Before we go on, two things to note:

# First, on web scraping:
# You should always check the terms of the site you are extracting data from, to make sure scraping (often referred to as `crawling`) is not prohibited. One way to do this is to visit the website’s `robots.txt` page, and ensure that a) there is nothing explicitly stating that crawlers are not permitted, and b) ideally, the site simply states that all user agents are permitted (indicated by a line saying `User-Agect: *`). Both of those are the case for our use-case today (see https://www.ultimatetennisstatistics.com/robots.txt).

# And second, about those custom fonts:

## urbanisation_mountains.R
# Prepare world data

# First up, we need to load the built-up area data that we’re going to be plotting. We download this from the European Commission’s Global Human Settlement Data portal [https://ghsl.jrc.ec.europa.eu/datasets.php] — specifically using the links from this page [http://cidportal.jrc.ec.europa.eu/ftp/jrc-opendata/GHSL/GHS_BUILT_LDSMT_GLOBE_R2015B/]. We want the 250m-resolution rasters for 1975 and 2015 (GHS_BUILT_LDS1975_GLOBE_R2016A_54009_250 and GHS_BUILT_LDS2014_GLOBE_R2016A_54009_250).

# Once you’ve downloaded these (they’re BIG, so might take a little while...), we can save ourselves a lot of hassle later on by re-projecting them into the same co-ordinate space as the other data we’re going to be using. Specifically we want to change their units from metres to lat/lon. We do this by:
# 1) Unzipping the archive, and then
# 2) Running the following script on the command-line:

# gdalwarp -t_srs EPSG:4326 -tr 0.01 0.01 path/to/your/built-up-area.tif path/to/your/built-up-area_reprojected.

## populationCurves.R
# Data is the UN's Medium-variant population projections, available at https://population.un.org/wpp/

data %>%
  filter(Sex != "Both" & A3 %in% c("GBR", "RUS", "IND", "CHN", "RWA", "GRC") & Year %in% 2018:2060) %>%
  as.tibble %>%
  mutate(
    group = paste0(Year, Sex), AgeGrp = as.numeric(AgeGrp),
    Location = Location %>% gsub("n Federation","",.)
    ) %>%
  ggplot(aes(AgeGrp, Value, col=Sex, group=group)) +

## disease_outbreaks_camelot.ipynb

      
              1 file
            
          
              7 forks
            
          
              2 comments
            
          
              18 stars
            
          
                vinayak-mehta
                / disease_outbreaks_camelot.ipynb
            
            
              Last active
              November 5, 2023 18:54
            
              
                A jupyter notebook showing how Camelot can be used to extract tables from PDFs scraped from the IDSP website.
              
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## sparkbar.R
# Takes an ordered vector of numeric values and returns a small bar chart made
# out of Unicode block elements. Works well inside dplyr mutate() or summarise()
# calls on grouped data frames.

sparkbar <- function(values) {
  span <- max(values) - min(values)
  if(span > 0 & !is.na(span)) {
    steps <- round(values / (span /  7))
    blocks <- c('▁', '▂', '▃', '▄', '▅', '▆', '▇', '█')
    paste(sapply(steps - (min(steps) - 1), function(i) blocks[i]), collapse = '')

## twitterbot.py
import tweepy
import random
import pandas as pd
import time

#get your codes from https://apps.twitter.com/
consumer_key = 'your_code_here'
consumer_secret = 'your_code_here'
access_token = 'your_code_here'
access_token_secret = 'your_code_here'

## animate_labels.R
library(ggplot2) # requires 2.3.0
library(purrr)

make_plot <- function(frame) {
  ggplot(mtcars, aes(mpg, hp, color = factor(cyl))) +
    geom_point() +
    scale_color_brewer(
      palette = 2, type = "qual", name = "cyl",
      guide = guide_legend(
        direction = "horizontal",

## facebook-contact-info-summary.rb
#! /usr/bin/env ruby

# NOTE: Requires Ruby 2.1 or greater.

# This script can be used to parse and dump the information from
# the 'html/contact_info.htm' file in a Facebook user data ZIP download.
#
# It prints all cell phone call + SMS message + MMS records, plus a summary of each.
#
# It also dumps all of the records into CSV files inside a 'CSV' folder, that is created

## flights.js
var fs = require('fs');
var request = require('request-promise');
var moment = require('moment')

// Globals
global.timestamp = moment().unix()
global.allPlaybacks = [];
global.geojson = {};
global.geojson['type'] = 'FeatureCollection';
global.geojson['features'] = [];
	# Load the packages we’re going to be using:
	# Alongside the usual stuff like tidyverse and magrittr, we’ll be using rvest for some web-scraping, jsonline to parse some JSON, and extrafont to load some nice custom fonts
	needs(tidyverse, magrittr, rvest, jsonlite, extrafont)

	# Before we go on, two things to note:

	# First, on web scraping:
	# You should always check the terms of the site you are extracting data from, to make sure scraping (often referred to as `crawling`) is not prohibited. One way to do this is to visit the website’s `robots.txt` page, and ensure that a) there is nothing explicitly stating that crawlers are not permitted, and b) ideally, the site simply states that all user agents are permitted (indicated by a line saying `User-Agect: *`). Both of those are the case for our use-case today (see https://www.ultimatetennisstatistics.com/robots.txt).

	# And second, about those custom fonts:
	# Prepare world data

	# First up, we need to load the built-up area data that we’re going to be plotting. We download this from the European Commission’s Global Human Settlement Data portal [https://ghsl.jrc.ec.europa.eu/datasets.php] — specifically using the links from this page [http://cidportal.jrc.ec.europa.eu/ftp/jrc-opendata/GHSL/GHS_BUILT_LDSMT_GLOBE_R2015B/]. We want the 250m-resolution rasters for 1975 and 2015 (GHS_BUILT_LDS1975_GLOBE_R2016A_54009_250 and GHS_BUILT_LDS2014_GLOBE_R2016A_54009_250).

	# Once you’ve downloaded these (they’re BIG, so might take a little while...), we can save ourselves a lot of hassle later on by re-projecting them into the same co-ordinate space as the other data we’re going to be using. Specifically we want to change their units from metres to lat/lon. We do this by:
	# 1) Unzipping the archive, and then
	# 2) Running the following script on the command-line:

	# gdalwarp -t_srs EPSG:4326 -tr 0.01 0.01 path/to/your/built-up-area.tif path/to/your/built-up-area_reprojected.
	# Data is the UN's Medium-variant population projections, available at https://population.un.org/wpp/

	data %>%
	filter(Sex != "Both" & A3 %in% c("GBR", "RUS", "IND", "CHN", "RWA", "GRC") & Year %in% 2018:2060) %>%
	as.tibble %>%
	mutate(
	group = paste0(Year, Sex), AgeGrp = as.numeric(AgeGrp),
	Location = Location %>% gsub("n Federation","",.)
	) %>%
	ggplot(aes(AgeGrp, Value, col=Sex, group=group)) +
	# Takes an ordered vector of numeric values and returns a small bar chart made
	# out of Unicode block elements. Works well inside dplyr mutate() or summarise()
	# calls on grouped data frames.

	sparkbar <- function(values) {
	span <- max(values) - min(values)
	if(span > 0 & !is.na(span)) {
	steps <- round(values / (span / 7))
	blocks <- c('▁', '▂', '▃', '▄', '▅', '▆', '▇', '█')
	paste(sapply(steps - (min(steps) - 1), function(i) blocks[i]), collapse = '')
	import tweepy
	import random
	import pandas as pd
	import time

	#get your codes from https://apps.twitter.com/
	consumer_key = 'your_code_here'
	consumer_secret = 'your_code_here'
	access_token = 'your_code_here'
	access_token_secret = 'your_code_here'
	library(ggplot2) # requires 2.3.0
	library(purrr)

	make_plot <- function(frame) {
	ggplot(mtcars, aes(mpg, hp, color = factor(cyl))) +
	geom_point() +
	scale_color_brewer(
	palette = 2, type = "qual", name = "cyl",
	guide = guide_legend(
	direction = "horizontal",
	#! /usr/bin/env ruby

	# NOTE: Requires Ruby 2.1 or greater.

	# This script can be used to parse and dump the information from
	# the 'html/contact_info.htm' file in a Facebook user data ZIP download.
	#
	# It prints all cell phone call + SMS message + MMS records, plus a summary of each.
	#
	# It also dumps all of the records into CSV files inside a 'CSV' folder, that is created
	var fs = require('fs');
	var request = require('request-promise');
	var moment = require('moment')

	// Globals
	global.timestamp = moment().unix()
	global.allPlaybacks = [];
	global.geojson = {};
	global.geojson['type'] = 'FeatureCollection';
	global.geojson['features'] = [];