Skip to content

Instantly share code, notes, and snippets.

evanmiltenburg /
Last active January 5, 2022 09:35
Download OHLDM
import requests
import re
import time
r = requests.get('',
stream=True, headers={'User-agent': 'Mozilla/5.0'})
urls = re.findall('href="(.*?.pdf)"', r.text)
base = ''
urls = [base + path for path in urls if '/book/' in path]
evanmiltenburg /
Created April 5, 2020 08:28
Script om personen te vinden in Nederlandse tekst
import spacy
nlp = spacy.load('nl_core_news_sm')
with open('bordewijk.txt') as f:
doc = nlp(
people = [ent.orth_ for ent in doc.ents if ent.label_ == 'PERSON']
evanmiltenburg /
Created June 17, 2019 10:14
Generate an Excel worksheet to provide word-level annotations
import xlsxwriter
# Create workbook with a new worksheet.
workbook = xlsxwriter.Workbook('hello.xlsx')
worksheet = workbook.add_worksheet()
# Write the tokens.
worksheet.write('A1', 'Hello')
worksheet.write('B1', 'world')
worksheet.write('C1', '!')
def get_lengths(num_lines, line_length):
"Get n lines, totaling a particular length."
lengths = np.random.random(num_lines)
lengths *= line_length / np.sum(lengths)
return lengths
def lines(line_length, page_width):
"Get a random number of lines, with n-1 gaps of varying length in between."
num_lines = np.random.randint(1,10)
lengths = get_lengths(num_lines, line_length)
evanmiltenburg / levelt.tex
Created April 26, 2018 12:08
Levelt's model of speech production
View levelt.tex
evanmiltenburg /
Created February 13, 2018 15:02
Circles in legend
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.lines import Line2D
my_palette = sns.color_palette("cubehelix", 3)
def legend_circles(labels, palette, loc=1, markersize=10, marker='o', padding=0):
"Make a legend where the color is indicated by a circle."

Papers at INLG

Here's a list of all the papers presented at INLG 2017, sourced from here. I made this list because it's easier to read and print.

Please refer to the INLG website for the official schedule, which may be subject to change, and also contains other events, like invited talks and the hackathon.


import csv
import numpy as np
from gensim.models import Word2Vec
from keras.models import Sequential
from keras.layers.core import Activation, Dense
from keras.callbacks import EarlyStopping
from collections import Counter
import re
import glob
class ConllEntry:
def __init__(self, id, form, pos, cpos, parent_id=None, relation=None): = id
self.form = form
self.norm = normalize(form)
self.cpos = cpos.upper()

Training a Dutch parser


  1. Get the text data: wget
  2. Get the code for the structured n-grams: wget
  3. Run unzip ; rm
  4. Build the word vector code: Run cd wang2vec-master/ ; make ; cd ..
  5. Train CBOW vectors: Run ./wang2vec-master/word2vec -train wikicorpus.txt -output cbow.vectors -type 0 -size 50 -window 5 -negative 10 -nce 0 -hs 0 -sample 1e-4 -threads 1 -iter 5 -cap 0 >> training.log 2>&1 &
  6. Train Structured skipngram vectors: Run ./wang2vec-master/word2vec -train wikicorpus.txt -output structured_ngram.vectors -type 3 -size 50 -window 5 -negative 10 -nce 0 -hs 0 -sample 1e-4 -threads 1 -iter 5 -cap 0 >> training_ssg.log 2>&1 &