Skip to content

Instantly share code, notes, and snippets.

View jeroenjanssens's full-sized avatar

Jeroen Janssens jeroenjanssens

View GitHub Profile
@jeroenjanssens
jeroenjanssens / remove-header.py
Created June 6, 2014 00:29
Remove header without streaming entire file
#!/usr/bin/env python
# The trick is to overwrite the file with spaces till the first newline.
# Only works if the program that reads it ignores empty lines.
import sys
filename = sys.argv[1]
f = open(filename, "r+b")
n = 0
while f.read(1) != "\n":
n += 1
@jeroenjanssens
jeroenjanssens / topwords.R
Created April 25, 2014 02:17
Get top N words from STDIN using Bash, Python, and R. All three scripts produce the same output, but R scales very badly w.r.t. to input size. What am I doing wrong?
#!/usr/bin/env Rscript
num.words <- as.integer(commandArgs(trailingOnly = TRUE))
f <- file("stdin")
input.lines <- readLines(f)
close(f)
full.text <- tolower(paste(input.lines, collapse = " "))
splits <- gregexpr("\\w+", full.text)
words.all <- (regmatches(full.text, splits)[[1]])
words.unique <- as.data.frame(table(words.all))
words.sorted <- words.unique[order(-words.unique$Freq),]
@jeroenjanssens
jeroenjanssens / chat.sh
Last active February 15, 2022 21:44
Simple chat server in bash, demonstrating websocketd.
#!/bin/bash
# Hacked together by JeroenJanssens.com on 2013-12-10
# Requires: https://github.com/joewalnes/websocketd
# Run: websocketd --devconsole --port 8080 ./chat.sh
echo "Please enter your name:"; read USER
echo "[$(date)] ${USER} joined the chat" >> chat.log
echo "[$(date)] Welcome to the chat ${USER}!"
tail -n 0 -f chat.log --pid=$$ | grep --line-buffered -v "] ${USER}>" &
while read MSG; do echo "[$(date)] ${USER}> ${MSG}" >> chat.log; done
@jeroenjanssens
jeroenjanssens / README.md
Last active December 29, 2015 03:39
Detecting anomalous senators

This interactive visualization demonstrates the Stochastic Outlier Selection (SOS) applied to roll call voting data. It was first presented at the NYC Machine Learning meetup on November 21, 2013. SOS is an unsupervised outlier-selection algorithm by J.H.M. Janssens, F. Huszar, E.O. Postma, and H.J. van den Herik (2012). It employs the concept of affinity to quantify the relationship between data points and subsequently computes an outlier probability for each data point. Intuitively, a data point is selected as an outlier when the other data points have insufficient affinity with it.

The data set contains 103 data points (senators) and 172 features (votes). The dissimilarity between the data points is the Euclidean distance. Each circle in the scatter plot represents a senator, of which the location is determined by applying the non-linear dimensionality reduction technique [t-SNE](http://homepage.tudelf

@jeroenjanssens
jeroenjanssens / README.md
Last active March 1, 2016 14:08
Stem-and-Leaf Plot

Back in the old days, when many data sets were still small, stem-and-leaf plots were a popular method of representing quantitative data. The example data shown in the text area comes from the cover of John Tukey's Exploratory Data Analysis. The stem-and-leaf plot updates as you change the data. Try adding fractions and negative values. Hover over the leaves to see the original values.