Markus Konrad internaut

## README.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                internaut
                / README.md
            
            
              Last active
              August 29, 2015 14:00
                — forked from JanDupal/README.md
            
          
    Jekyll sorted_for plugin

Quick'n'dirty Jekyll plugin for sorted cycle.
Modification

This fork fixes two issues:

problems when specifiying sort fields like sort_by:'weight' (with ' or " characters)
problems when a collection entry does not have the specified sort field


## map-1.svg

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                internaut
                / map-1.svg
            
            
              Last active
              August 29, 2016 09:57
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## pcp.R
### generate questionnaire data

library(triangle)

set.seed(0)

q1_d1 <- round(rtriangle(1000, 1, 7, 5))
q1_d2 <- round(rtriangle(1000, 1, 7, 6))
q1_d3 <- round(rtriangle(1000, 1, 7, 2))

## parallelized.py
"""
Runtime optimization through vectorization and parallelization.
Script 3: Parallel and vectorized calculation of haversine distance.

Please note that this might be slower than the single-core vectorized version because of the overhead that is caused
by multiprocessing.

January 2018
Markus Konrad <markus.konrad@wzb.eu>
"""

## multisplit.py
def str_multisplit(s, sep):
    """
    Split string `s` by all characters/strings in `sep`.

    :param s: a string to split
    :param sep: sequence or set of characters to use for splitting
    :return: list of split string parts
    """
    if not isinstance(s, (str, bytes)):
        raise ValueError('`s` must be of type `str` or `bytes`')

## cooc.py
import numpy as np


def word_cooccurrence(dtm):
    """
    Calculate the co-document frequency (aka word co-occurrence) matrix for a document-term matrix `dtm`, i.e. how often
    each pair of tokens occurs together at least once in the same document.

    :param dtm: (sparse) document-term-matrix of size NxM (N docs, M is vocab size) with raw term counts.
    :return: co-document frequency (aka word co-occurrence) matrix with shape MxM

## pandas_crossjoin_example.py
"""
Shows how to do a cross join (i.e. cartesian product) between two pandas DataFrames using an example on
calculating the distances between origin and destination cities.

Tested with pandas 0.17.1 and 0.18 on Python 3.4 and Python 3.5

Best run this with Spyder (see https://github.com/spyder-ide/spyder)
Author: Markus Konrad <post@mkonrad.net>

April 2016

## sponscraper_v1.py
"""
Sample scripts for blog post "Robust data collection via web scraping and web APIs"
(https://datascience.blog.wzb.eu/2020/12/01/robust-data-collection-via-web-scraping-and-web-apis/).

Script 1. Starting point – baseline (unreliable) web scraping script.

December 2020, Markus Konrad <markus.konrad@wzb.eu>
"""

from datetime import datetime, timedelta

## README.md

      
              2 files
            
          
              0 forks
            
          
              1 comment
            
          
              3 stars
            
          
                internaut
                / README.md
            
            
              Last active
              December 21, 2020 13:53
                — forked from vanto/README.md
            
          
    OEmbed Liquid Tag for Jekyll

This is a simple liquid tag that helps to easily embed images, videos or slides from OEmbed enabled providers. It uses Magnus Holm's great oembed gem which connects to the OEmbed endpoint of the link's provider and retrieves the HTML code to embed the content properly (i.e. an in-place YouTube player, Image tag for Flickr, in-place slideshare viewer etc.). By default it supports the following OEmbed providers (but can fallback to Embed.ly or OoEmbed for other providers):

Youtube
Flickr
Viddler
Qik
Revision3
Hulu
Vimeo


## balloon_plot_alt_heatmap.R
# Create a "balloon plot" as alternative to a heatmap with ggplot2
#
# January 2017
# Author: Markus Konrad <markus.konrad@wzb.eu>, WZB Berlin Social Science Center

library(dplyr)
library(tidyr)
library(ggplot2)

# define the variables that will be displayed in the columns
	### generate questionnaire data

	library(triangle)

	set.seed(0)

	q1_d1 <- round(rtriangle(1000, 1, 7, 5))
	q1_d2 <- round(rtriangle(1000, 1, 7, 6))
	q1_d3 <- round(rtriangle(1000, 1, 7, 2))
	"""
	Runtime optimization through vectorization and parallelization.
	Script 3: Parallel and vectorized calculation of haversine distance.

	Please note that this might be slower than the single-core vectorized version because of the overhead that is caused
	by multiprocessing.

	January 2018
	Markus Konrad <markus.konrad@wzb.eu>
	"""
	def str_multisplit(s, sep):
	"""
	Split string `s` by all characters/strings in `sep`.

	:param s: a string to split
	:param sep: sequence or set of characters to use for splitting
	:return: list of split string parts
	"""
	if not isinstance(s, (str, bytes)):
	raise ValueError('`s` must be of type `str` or `bytes`')
	import numpy as np


	def word_cooccurrence(dtm):
	"""
	Calculate the co-document frequency (aka word co-occurrence) matrix for a document-term matrix `dtm`, i.e. how often
	each pair of tokens occurs together at least once in the same document.

	:param dtm: (sparse) document-term-matrix of size NxM (N docs, M is vocab size) with raw term counts.
	:return: co-document frequency (aka word co-occurrence) matrix with shape MxM
	"""
	Shows how to do a cross join (i.e. cartesian product) between two pandas DataFrames using an example on
	calculating the distances between origin and destination cities.

	Tested with pandas 0.17.1 and 0.18 on Python 3.4 and Python 3.5

	Best run this with Spyder (see https://github.com/spyder-ide/spyder)
	Author: Markus Konrad <post@mkonrad.net>

	April 2016
	"""
	Sample scripts for blog post "Robust data collection via web scraping and web APIs"
	(https://datascience.blog.wzb.eu/2020/12/01/robust-data-collection-via-web-scraping-and-web-apis/).

	Script 1. Starting point – baseline (unreliable) web scraping script.

	December 2020, Markus Konrad <markus.konrad@wzb.eu>
	"""

	from datetime import datetime, timedelta
	# Create a "balloon plot" as alternative to a heatmap with ggplot2
	#
	# January 2017
	# Author: Markus Konrad <markus.konrad@wzb.eu>, WZB Berlin Social Science Center

	library(dplyr)
	library(tidyr)
	library(ggplot2)

	# define the variables that will be displayed in the columns