{{ message }}

Instantly share code, notes, and snippets.

Created Feb 10, 2021
Voronoi regions of schools in East Germany. An example using the geovoronoi package (https://pypi.org/project/geovoronoi/).
View voronoize.py
 """ Voronoi regions of schools in East Germany. An example using the geovoronoi package (https://pypi.org/project/geovoronoi/). Feb. 2021 Markus Konrad """ import os
Created Dec 1, 2020
Sample scripts for blog post "Robust data collection via web scraping and web APIs".
View sponscraper_v1.py
 """ Sample scripts for blog post "Robust data collection via web scraping and web APIs" (https://datascience.blog.wzb.eu/2020/12/01/robust-data-collection-via-web-scraping-and-web-apis/). Script 1. Starting point – baseline (unreliable) web scraping script. December 2020, Markus Konrad """ from datetime import datetime, timedelta
Last active Nov 8, 2019
Function to calculate word co-occurrence from document-term matrix and a test using the hypothesis package
View cooc.py
 import numpy as np def word_cooccurrence(dtm): """ Calculate the co-document frequency (aka word co-occurrence) matrix for a document-term matrix `dtm`, i.e. how often each pair of tokens occurs together at least once in the same document. :param dtm: (sparse) document-term-matrix of size NxM (N docs, M is vocab size) with raw term counts. :return: co-document frequency (aka word co-occurrence) matrix with shape MxM
Last active Nov 7, 2019
Split a string by multiple characters/strings. Test the function with pytest and hypothesis.
View multisplit.py
 def str_multisplit(s, sep): """ Split string `s` by all characters/strings in `sep`. :param s: a string to split :param sep: sequence or set of characters to use for splitting :return: list of split string parts """ if not isinstance(s, (str, bytes)): raise ValueError('`s` must be of type `str` or `bytes`')
Created Apr 30, 2019
Zooming in on maps with sf and ggplot2
View zoom.R
 # Source for blog post "Zooming in on maps with sf and ggplot2" # URL: https://datascience.blog.wzb.eu/2019/04/30/zooming-in-on-maps-with-sf-and-ggplot2/ # # Markus Konrad # Wissenschaftszentrum Berlin für Sozialforschung # April 30, 2019 # #### world map ####
Created May 30, 2018
Three ways of plotting a network graph of nodes with geographic coordinates on a map
View networkmap.R
 # Plot a network graph of nodes with geographic coordinates on a map. # # Author: Markus Konrad # May 2018 # # This script shows three ways of plotting a network graph on a map. # The following information should be visualized (with the respective # aestethics added): # # * graph nodes with:
Last active Feb 2, 2018
Runtime optimization through vectorization and parallelization
View parallelized.py
 """ Runtime optimization through vectorization and parallelization. Script 3: Parallel and vectorized calculation of haversine distance. Please note that this might be slower than the single-core vectorized version because of the overhead that is caused by multiprocessing. January 2018 Markus Konrad """
Created Jan 24, 2017
Create a "balloon plot" as alternative to a heatmap with ggplot2
View balloon_plot_alt_heatmap.R
 # Create a "balloon plot" as alternative to a heatmap with ggplot2 # # January 2017 # Author: Markus Konrad , WZB Berlin Social Science Center library(dplyr) library(tidyr) library(ggplot2) # define the variables that will be displayed in the columns
Created Sep 27, 2016
Comparison of Parallel Coordinate Plots for Discrete and Categorical Data in R
View pcp.R
 ### generate questionnaire data library(triangle) set.seed(0) q1_d1 <- round(rtriangle(1000, 1, 7, 5)) q1_d2 <- round(rtriangle(1000, 1, 7, 6)) q1_d3 <- round(rtriangle(1000, 1, 7, 2))
Last active Aug 29, 2016
View map-1.svg