Skip to content

Instantly share code, notes, and snippets.

@internaut
internaut / voronoize.py
Created Feb 10, 2021
Voronoi regions of schools in East Germany. An example using the geovoronoi package (https://pypi.org/project/geovoronoi/).
View voronoize.py
"""
Voronoi regions of schools in East Germany.
An example using the geovoronoi package (https://pypi.org/project/geovoronoi/).
Feb. 2021
Markus Konrad <markus.konrad@wzb.eu>
"""
import os
@internaut
internaut / sponscraper_v1.py
Created Dec 1, 2020
Sample scripts for blog post "Robust data collection via web scraping and web APIs".
View sponscraper_v1.py
"""
Sample scripts for blog post "Robust data collection via web scraping and web APIs"
(https://datascience.blog.wzb.eu/2020/12/01/robust-data-collection-via-web-scraping-and-web-apis/).
Script 1. Starting point – baseline (unreliable) web scraping script.
December 2020, Markus Konrad <markus.konrad@wzb.eu>
"""
from datetime import datetime, timedelta
@internaut
internaut / cooc.py
Last active Nov 8, 2019
Function to calculate word co-occurrence from document-term matrix and a test using the hypothesis package
View cooc.py
import numpy as np
def word_cooccurrence(dtm):
"""
Calculate the co-document frequency (aka word co-occurrence) matrix for a document-term matrix `dtm`, i.e. how often
each pair of tokens occurs together at least once in the same document.
:param dtm: (sparse) document-term-matrix of size NxM (N docs, M is vocab size) with raw term counts.
:return: co-document frequency (aka word co-occurrence) matrix with shape MxM
@internaut
internaut / multisplit.py
Last active Nov 7, 2019
Split a string by multiple characters/strings. Test the function with pytest and hypothesis.
View multisplit.py
def str_multisplit(s, sep):
"""
Split string `s` by all characters/strings in `sep`.
:param s: a string to split
:param sep: sequence or set of characters to use for splitting
:return: list of split string parts
"""
if not isinstance(s, (str, bytes)):
raise ValueError('`s` must be of type `str` or `bytes`')
@internaut
internaut / zoom.R
Created Apr 30, 2019
Zooming in on maps with sf and ggplot2
View zoom.R
# Source for blog post "Zooming in on maps with sf and ggplot2"
# URL: https://datascience.blog.wzb.eu/2019/04/30/zooming-in-on-maps-with-sf-and-ggplot2/
#
# Markus Konrad <markus.konrad@wzb.eu>
# Wissenschaftszentrum Berlin für Sozialforschung
# April 30, 2019
#
#### world map ####
@internaut
internaut / networkmap.R
Created May 30, 2018
Three ways of plotting a network graph of nodes with geographic coordinates on a map
View networkmap.R
# Plot a network graph of nodes with geographic coordinates on a map.
#
# Author: Markus Konrad <markus.konrad@wzb.eu>
# May 2018
#
# This script shows three ways of plotting a network graph on a map.
# The following information should be visualized (with the respective
# aestethics added):
#
# * graph nodes with:
@internaut
internaut / parallelized.py
Last active Feb 2, 2018
Runtime optimization through vectorization and parallelization
View parallelized.py
"""
Runtime optimization through vectorization and parallelization.
Script 3: Parallel and vectorized calculation of haversine distance.
Please note that this might be slower than the single-core vectorized version because of the overhead that is caused
by multiprocessing.
January 2018
Markus Konrad <markus.konrad@wzb.eu>
"""
@internaut
internaut / balloon_plot_alt_heatmap.R
Created Jan 24, 2017
Create a "balloon plot" as alternative to a heatmap with ggplot2
View balloon_plot_alt_heatmap.R
# Create a "balloon plot" as alternative to a heatmap with ggplot2
#
# January 2017
# Author: Markus Konrad <markus.konrad@wzb.eu>, WZB Berlin Social Science Center
library(dplyr)
library(tidyr)
library(ggplot2)
# define the variables that will be displayed in the columns
@internaut
internaut / pcp.R
Created Sep 27, 2016
Comparison of Parallel Coordinate Plots for Discrete and Categorical Data in R
View pcp.R
### generate questionnaire data
library(triangle)
set.seed(0)
q1_d1 <- round(rtriangle(1000, 1, 7, 5))
q1_d2 <- round(rtriangle(1000, 1, 7, 6))
q1_d3 <- round(rtriangle(1000, 1, 7, 2))
View map-1.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.