Skip to content

Instantly share code, notes, and snippets.

View evanmiltenburg's full-sized avatar

Emiel van Miltenburg evanmiltenburg

View GitHub Profile
# modified from: https://gist.github.com/dellis23/6174914/
# - Added NLTK, which simplifies the chain and ngram logic.
# To use this script, you need to have downloaded the punkt
# data like this:
#
# import nltk
# nltk.download('punkt')
#
# - No more occasional KeyErrors.
@evanmiltenburg
evanmiltenburg / colorcombos.py
Created June 12, 2015 23:09
Script to easily obtain lists of RGB tuples from a .txt file containing URLs from colorcombos.com
from urlparse import urlparse
import colorsys, sys
def keyword_tuples(query):
"""Create (keyword,value) tuples for the query."""
return map(lambda x:tuple(x.split('=')), query.split('&'))
def get_colors(url):
"""Get colors from the URL, returns a list of hex color values (without #)"""
result = urlparse(url)
@evanmiltenburg
evanmiltenburg / json_argument_example.py
Last active April 18, 2016 20:00
Proposal for serialising arguments
import argparse
import json
def get_parser():
"Get an argument parser for this module."
parser = argparse.ArgumentParser(
description="Train an word embedding model using LSTM network")
parser.add_argument("--run_string", default="", type=str,
help="Optional string to help you identify the run")

Training a Dutch parser

Steps

  1. Get the text data: wget http://kyoto.let.vu.nl/~miltenburg/public_data/wikicorpus/corpus/wikicorpus.txt.gz
  2. Get the code for the structured n-grams: wget https://github.com/wlin12/wang2vec/archive/master.zip
  3. Run unzip master.zip ; rm master.zip
  4. Build the word vector code: Run cd wang2vec-master/ ; make ; cd ..
  5. Train CBOW vectors: Run ./wang2vec-master/word2vec -train wikicorpus.txt -output cbow.vectors -type 0 -size 50 -window 5 -negative 10 -nce 0 -hs 0 -sample 1e-4 -threads 1 -iter 5 -cap 0 >> training.log 2>&1 &
  6. Train Structured skipngram vectors: Run ./wang2vec-master/word2vec -train wikicorpus.txt -output structured_ngram.vectors -type 3 -size 50 -window 5 -negative 10 -nce 0 -hs 0 -sample 1e-4 -threads 1 -iter 5 -cap 0 >> training_ssg.log 2>&1 &
from collections import Counter
import re
import glob
class ConllEntry:
def __init__(self, id, form, pos, cpos, parent_id=None, relation=None):
self.id = id
self.form = form
self.norm = normalize(form)
self.cpos = cpos.upper()
import csv
import numpy as np
from gensim.models import Word2Vec
np.random.seed(1234)
from keras.models import Sequential
from keras.layers.core import Activation, Dense
from keras.callbacks import EarlyStopping

Papers at INLG

Here's a list of all the papers presented at INLG 2017, sourced from here. I made this list because it's easier to read and print.

Please refer to the INLG website for the official schedule, which may be subject to change, and also contains other events, like invited talks and the hackathon.

Tuesday

@evanmiltenburg
evanmiltenburg / legend_circles.py
Created February 13, 2018 15:02
Circles in legend
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.lines import Line2D
my_palette = sns.color_palette("cubehelix", 3)
sns.set_palette(my_palette)
def legend_circles(labels, palette, loc=1, markersize=10, marker='o', padding=0):
"Make a legend where the color is indicated by a circle."
@evanmiltenburg
evanmiltenburg / levelt.tex
Created April 26, 2018 12:08
Levelt's model of speech production
\documentclass[12pt]{standalone}
\usepackage{tgtermes}
\usepackage{tgheros}
\usepackage[T1]{fontenc}
\usepackage{tikz}
\usetikzlibrary{arrows}
\usetikzlibrary{arrows.meta}
\usetikzlibrary{calc}
def get_lengths(num_lines, line_length):
"Get n lines, totaling a particular length."
lengths = np.random.random(num_lines)
lengths *= line_length / np.sum(lengths)
return lengths
def lines(line_length, page_width):
"Get a random number of lines, with n-1 gaps of varying length in between."
num_lines = np.random.randint(1,10)
lengths = get_lengths(num_lines, line_length)