Skip to content

Instantly share code, notes, and snippets.

View internaut's full-sized avatar

Markus Konrad internaut

View GitHub Profile
@internaut
internaut / xsfpcopy.py
Created April 28, 2022 16:15
Copy contents of a XSFP music playlist to a target folder
#!/bin/python3
# Copy contents of a XSFP music playlist to a target folder
#
# required two arguments: path to xspf file, target path
# requires Python >= 3.8
#
# author: Markus Konrad <post@mkonrad.net>
import os.path
import sys
@internaut
internaut / transfer.py
Created February 22, 2022 09:33
Transfer a user's GitLab projects to a new group.
"""
Transfer all GitLab projects from the user authenticated with a supplied private access token (PAT) to a new
namespace (i.e. a group with a group ID).
To generate a PAT, log in to your GitLab account and go to "User settings > Access tokens".
To find out the ID of a group to which you want to transfer the projects, go to the group's page. The group ID is shown
under the title of the group.
Requirements: Python 3 with requests package installed (tested with Python 3.8 and requests 2.27.1).
@internaut
internaut / voronoize.py
Created February 10, 2021 19:25
Voronoi regions of schools in East Germany. An example using the geovoronoi package (https://pypi.org/project/geovoronoi/).
"""
Voronoi regions of schools in East Germany.
An example using the geovoronoi package (https://pypi.org/project/geovoronoi/).
Feb. 2021
Markus Konrad <markus.konrad@wzb.eu>
"""
import os
@internaut
internaut / sponscraper_v1.py
Created December 1, 2020 13:36
Sample scripts for blog post "Robust data collection via web scraping and web APIs".
"""
Sample scripts for blog post "Robust data collection via web scraping and web APIs"
(https://datascience.blog.wzb.eu/2020/12/01/robust-data-collection-via-web-scraping-and-web-apis/).
Script 1. Starting point – baseline (unreliable) web scraping script.
December 2020, Markus Konrad <markus.konrad@wzb.eu>
"""
from datetime import datetime, timedelta
@internaut
internaut / cooc.py
Last active November 8, 2019 14:16
Function to calculate word co-occurrence from document-term matrix and a test using the hypothesis package
import numpy as np
def word_cooccurrence(dtm):
"""
Calculate the co-document frequency (aka word co-occurrence) matrix for a document-term matrix `dtm`, i.e. how often
each pair of tokens occurs together at least once in the same document.
:param dtm: (sparse) document-term-matrix of size NxM (N docs, M is vocab size) with raw term counts.
:return: co-document frequency (aka word co-occurrence) matrix with shape MxM
@internaut
internaut / multisplit.py
Last active November 7, 2019 15:24
Split a string by multiple characters/strings. Test the function with pytest and hypothesis.
def str_multisplit(s, sep):
"""
Split string `s` by all characters/strings in `sep`.
:param s: a string to split
:param sep: sequence or set of characters to use for splitting
:return: list of split string parts
"""
if not isinstance(s, (str, bytes)):
raise ValueError('`s` must be of type `str` or `bytes`')
@internaut
internaut / zoom.R
Created April 30, 2019 08:18
Zooming in on maps with sf and ggplot2
# Source for blog post "Zooming in on maps with sf and ggplot2"
# URL: https://datascience.blog.wzb.eu/2019/04/30/zooming-in-on-maps-with-sf-and-ggplot2/
#
# Markus Konrad <markus.konrad@wzb.eu>
# Wissenschaftszentrum Berlin für Sozialforschung
# April 30, 2019
#
#### world map ####
@internaut
internaut / networkmap.R
Created May 30, 2018 13:23
Three ways of plotting a network graph of nodes with geographic coordinates on a map
# Plot a network graph of nodes with geographic coordinates on a map.
#
# Author: Markus Konrad <markus.konrad@wzb.eu>
# May 2018
#
# This script shows three ways of plotting a network graph on a map.
# The following information should be visualized (with the respective
# aestethics added):
#
# * graph nodes with:
@internaut
internaut / parallelized.py
Last active February 2, 2018 14:06
Runtime optimization through vectorization and parallelization
"""
Runtime optimization through vectorization and parallelization.
Script 3: Parallel and vectorized calculation of haversine distance.
Please note that this might be slower than the single-core vectorized version because of the overhead that is caused
by multiprocessing.
January 2018
Markus Konrad <markus.konrad@wzb.eu>
"""
@internaut
internaut / balloon_plot_alt_heatmap.R
Created January 24, 2017 10:55
Create a "balloon plot" as alternative to a heatmap with ggplot2
# Create a "balloon plot" as alternative to a heatmap with ggplot2
#
# January 2017
# Author: Markus Konrad <markus.konrad@wzb.eu>, WZB Berlin Social Science Center
library(dplyr)
library(tidyr)
library(ggplot2)
# define the variables that will be displayed in the columns