Skip to content

Instantly share code, notes, and snippets.

@internaut
internaut / cooc.py
Last active Nov 8, 2019
Function to calculate word co-occurrence from document-term matrix and a test using the hypothesis package
View cooc.py
import numpy as np
def word_cooccurrence(dtm):
"""
Calculate the co-document frequency (aka word co-occurrence) matrix for a document-term matrix `dtm`, i.e. how often
each pair of tokens occurs together at least once in the same document.
:param dtm: (sparse) document-term-matrix of size NxM (N docs, M is vocab size) with raw term counts.
:return: co-document frequency (aka word co-occurrence) matrix with shape MxM
@internaut
internaut / multisplit.py
Last active Nov 7, 2019
Split a string by multiple characters/strings. Test the function with pytest and hypothesis.
View multisplit.py
def str_multisplit(s, sep):
"""
Split string `s` by all characters/strings in `sep`.
:param s: a string to split
:param sep: sequence or set of characters to use for splitting
:return: list of split string parts
"""
if not isinstance(s, (str, bytes)):
raise ValueError('`s` must be of type `str` or `bytes`')
@internaut
internaut / zoom.R
Created Apr 30, 2019
Zooming in on maps with sf and ggplot2
View zoom.R
# Source for blog post "Zooming in on maps with sf and ggplot2"
# URL: https://datascience.blog.wzb.eu/2019/04/30/zooming-in-on-maps-with-sf-and-ggplot2/
#
# Markus Konrad <markus.konrad@wzb.eu>
# Wissenschaftszentrum Berlin für Sozialforschung
# April 30, 2019
#
#### world map ####
@internaut
internaut / networkmap.R
Created May 30, 2018
Three ways of plotting a network graph of nodes with geographic coordinates on a map
View networkmap.R
# Plot a network graph of nodes with geographic coordinates on a map.
#
# Author: Markus Konrad <markus.konrad@wzb.eu>
# May 2018
#
# This script shows three ways of plotting a network graph on a map.
# The following information should be visualized (with the respective
# aestethics added):
#
# * graph nodes with:
@internaut
internaut / parallelized.py
Last active Feb 2, 2018
Runtime optimization through vectorization and parallelization
View parallelized.py
"""
Runtime optimization through vectorization and parallelization.
Script 3: Parallel and vectorized calculation of haversine distance.
Please note that this might be slower than the single-core vectorized version because of the overhead that is caused
by multiprocessing.
January 2018
Markus Konrad <markus.konrad@wzb.eu>
"""
@internaut
internaut / balloon_plot_alt_heatmap.R
Created Jan 24, 2017
Create a "balloon plot" as alternative to a heatmap with ggplot2
View balloon_plot_alt_heatmap.R
# Create a "balloon plot" as alternative to a heatmap with ggplot2
#
# January 2017
# Author: Markus Konrad <markus.konrad@wzb.eu>, WZB Berlin Social Science Center
library(dplyr)
library(tidyr)
library(ggplot2)
# define the variables that will be displayed in the columns
@internaut
internaut / pcp.R
Created Sep 27, 2016
Comparison of Parallel Coordinate Plots for Discrete and Categorical Data in R
View pcp.R
### generate questionnaire data
library(triangle)
set.seed(0)
q1_d1 <- round(rtriangle(1000, 1, 7, 5))
q1_d2 <- round(rtriangle(1000, 1, 7, 6))
q1_d3 <- round(rtriangle(1000, 1, 7, 2))
View map-1.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@internaut
internaut / pandas_crossjoin_example.py
Last active Jun 12, 2020
Shows how to do a cross join (i.e. cartesian product) between two pandas DataFrames using an example on calculating the distances between origin and destination cities. See https://mkonrad.net/2016/04/16/cross-join--cartesian-product-between-pandas-dataframes.html
View pandas_crossjoin_example.py
"""
Shows how to do a cross join (i.e. cartesian product) between two pandas DataFrames using an example on
calculating the distances between origin and destination cities.
Tested with pandas 0.17.1 and 0.18 on Python 3.4 and Python 3.5
Best run this with Spyder (see https://github.com/spyder-ide/spyder)
Author: Markus Konrad <post@mkonrad.net>
April 2016
View README.md

Jekyll sorted_for plugin

Quick'n'dirty Jekyll plugin for sorted cycle.

Modification

This fork fixes two issues:

  • problems when specifiying sort fields like sort_by:'weight' (with ' or " characters)
  • problems when a collection entry does not have the specified sort field
You can’t perform that action at this time.