Skip to content

Instantly share code, notes, and snippets.

View internaut's full-sized avatar

Markus Konrad internaut

View GitHub Profile
@internaut
internaut / README.md
Last active August 29, 2015 14:00 — forked from JanDupal/README.md

Jekyll sorted_for plugin

Quick'n'dirty Jekyll plugin for sorted cycle.

Modification

This fork fixes two issues:

  • problems when specifiying sort fields like sort_by:'weight' (with ' or " characters)
  • problems when a collection entry does not have the specified sort field
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@internaut
internaut / pcp.R
Created September 27, 2016 13:07
Comparison of Parallel Coordinate Plots for Discrete and Categorical Data in R
### generate questionnaire data
library(triangle)
set.seed(0)
q1_d1 <- round(rtriangle(1000, 1, 7, 5))
q1_d2 <- round(rtriangle(1000, 1, 7, 6))
q1_d3 <- round(rtriangle(1000, 1, 7, 2))
@internaut
internaut / parallelized.py
Last active February 2, 2018 14:06
Runtime optimization through vectorization and parallelization
"""
Runtime optimization through vectorization and parallelization.
Script 3: Parallel and vectorized calculation of haversine distance.
Please note that this might be slower than the single-core vectorized version because of the overhead that is caused
by multiprocessing.
January 2018
Markus Konrad <markus.konrad@wzb.eu>
"""
@internaut
internaut / multisplit.py
Last active November 7, 2019 15:24
Split a string by multiple characters/strings. Test the function with pytest and hypothesis.
def str_multisplit(s, sep):
"""
Split string `s` by all characters/strings in `sep`.
:param s: a string to split
:param sep: sequence or set of characters to use for splitting
:return: list of split string parts
"""
if not isinstance(s, (str, bytes)):
raise ValueError('`s` must be of type `str` or `bytes`')
@internaut
internaut / cooc.py
Last active November 8, 2019 14:16
Function to calculate word co-occurrence from document-term matrix and a test using the hypothesis package
import numpy as np
def word_cooccurrence(dtm):
"""
Calculate the co-document frequency (aka word co-occurrence) matrix for a document-term matrix `dtm`, i.e. how often
each pair of tokens occurs together at least once in the same document.
:param dtm: (sparse) document-term-matrix of size NxM (N docs, M is vocab size) with raw term counts.
:return: co-document frequency (aka word co-occurrence) matrix with shape MxM
@internaut
internaut / pandas_crossjoin_example.py
Last active June 12, 2020 14:30
Shows how to do a cross join (i.e. cartesian product) between two pandas DataFrames using an example on calculating the distances between origin and destination cities. See https://mkonrad.net/2016/04/16/cross-join--cartesian-product-between-pandas-dataframes.html
"""
Shows how to do a cross join (i.e. cartesian product) between two pandas DataFrames using an example on
calculating the distances between origin and destination cities.
Tested with pandas 0.17.1 and 0.18 on Python 3.4 and Python 3.5
Best run this with Spyder (see https://github.com/spyder-ide/spyder)
Author: Markus Konrad <post@mkonrad.net>
April 2016
@internaut
internaut / sponscraper_v1.py
Created December 1, 2020 13:36
Sample scripts for blog post "Robust data collection via web scraping and web APIs".
"""
Sample scripts for blog post "Robust data collection via web scraping and web APIs"
(https://datascience.blog.wzb.eu/2020/12/01/robust-data-collection-via-web-scraping-and-web-apis/).
Script 1. Starting point – baseline (unreliable) web scraping script.
December 2020, Markus Konrad <markus.konrad@wzb.eu>
"""
from datetime import datetime, timedelta
@internaut
internaut / README.md
Last active December 21, 2020 13:53 — forked from vanto/README.md

OEmbed Liquid Tag for Jekyll

This is a simple liquid tag that helps to easily embed images, videos or slides from OEmbed enabled providers. It uses Magnus Holm's great oembed gem which connects to the OEmbed endpoint of the link's provider and retrieves the HTML code to embed the content properly (i.e. an in-place YouTube player, Image tag for Flickr, in-place slideshare viewer etc.). By default it supports the following OEmbed providers (but can fallback to Embed.ly or OoEmbed for other providers):

  • Youtube
  • Flickr
  • Viddler
  • Qik
  • Revision3
  • Hulu
  • Vimeo
@internaut
internaut / balloon_plot_alt_heatmap.R
Created January 24, 2017 10:55
Create a "balloon plot" as alternative to a heatmap with ggplot2
# Create a "balloon plot" as alternative to a heatmap with ggplot2
#
# January 2017
# Author: Markus Konrad <markus.konrad@wzb.eu>, WZB Berlin Social Science Center
library(dplyr)
library(tidyr)
library(ggplot2)
# define the variables that will be displayed in the columns