- Get the text data:
wget http://kyoto.let.vu.nl/~miltenburg/public_data/wikicorpus/corpus/wikicorpus.txt.gz
- Get the code for the structured n-grams:
wget https://github.com/wlin12/wang2vec/archive/master.zip
- Run
unzip master.zip ; rm master.zip
- Build the word vector code: Run
cd wang2vec-master/ ; make ; cd ..
- Train CBOW vectors: Run
./wang2vec-master/word2vec -train wikicorpus.txt -output cbow.vectors -type 0 -size 50 -window 5 -negative 10 -nce 0 -hs 0 -sample 1e-4 -threads 1 -iter 5 -cap 0 >> training.log 2>&1 &
- Train Structured skipngram vectors: Run
./wang2vec-master/word2vec -train wikicorpus.txt -output structured_ngram.vectors -type 3 -size 50 -window 5 -negative 10 -nce 0 -hs 0 -sample 1e-4 -threads 1 -iter 5 -cap 0 >> training_ssg.log 2>&1 &
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# modified from: https://gist.github.com/dellis23/6174914/ | |
# - Added NLTK, which simplifies the chain and ngram logic. | |
# To use this script, you need to have downloaded the punkt | |
# data like this: | |
# | |
# import nltk | |
# nltk.download('punkt') | |
# | |
# - No more occasional KeyErrors. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from urlparse import urlparse | |
import colorsys, sys | |
def keyword_tuples(query): | |
"""Create (keyword,value) tuples for the query.""" | |
return map(lambda x:tuple(x.split('=')), query.split('&')) | |
def get_colors(url): | |
"""Get colors from the URL, returns a list of hex color values (without #)""" | |
result = urlparse(url) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import argparse | |
import json | |
def get_parser(): | |
"Get an argument parser for this module." | |
parser = argparse.ArgumentParser( | |
description="Train an word embedding model using LSTM network") | |
parser.add_argument("--run_string", default="", type=str, | |
help="Optional string to help you identify the run") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from collections import Counter | |
import re | |
import glob | |
class ConllEntry: | |
def __init__(self, id, form, pos, cpos, parent_id=None, relation=None): | |
self.id = id | |
self.form = form | |
self.norm = normalize(form) | |
self.cpos = cpos.upper() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import csv | |
import numpy as np | |
from gensim.models import Word2Vec | |
np.random.seed(1234) | |
from keras.models import Sequential | |
from keras.layers.core import Activation, Dense | |
from keras.callbacks import EarlyStopping |
Here's a list of all the papers presented at INLG 2017, sourced from here. I made this list because it's easier to read and print.
Please refer to the INLG website for the official schedule, which may be subject to change, and also contains other events, like invited talks and the hackathon.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import matplotlib.pyplot as plt | |
import seaborn as sns | |
from matplotlib.lines import Line2D | |
my_palette = sns.color_palette("cubehelix", 3) | |
sns.set_palette(my_palette) | |
def legend_circles(labels, palette, loc=1, markersize=10, marker='o', padding=0): | |
"Make a legend where the color is indicated by a circle." |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
\documentclass[12pt]{standalone} | |
\usepackage{tgtermes} | |
\usepackage{tgheros} | |
\usepackage[T1]{fontenc} | |
\usepackage{tikz} | |
\usetikzlibrary{arrows} | |
\usetikzlibrary{arrows.meta} | |
\usetikzlibrary{calc} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def get_lengths(num_lines, line_length): | |
"Get n lines, totaling a particular length." | |
lengths = np.random.random(num_lines) | |
lengths *= line_length / np.sum(lengths) | |
return lengths | |
def lines(line_length, page_width): | |
"Get a random number of lines, with n-1 gaps of varying length in between." | |
num_lines = np.random.randint(1,10) | |
lengths = get_lengths(num_lines, line_length) |
OlderNewer