Skip to content

Instantly share code, notes, and snippets.

@ottokart
ottokart / punct_server.py
Created May 1, 2019 18:34
Simple script to serve a punctuation API on https://github.com/ottokart/punctuator2 (similar to http://bark.phon.ioc.ee/punctuator). need to change the configuration at the beginning of the file to point to the model file. Also, the script should be in the same dir as the other punctuator scripts. The Europarl model that I use in the demo (http:…
# coding: utf-8
from __future__ import division
from nltk.tokenize import word_tokenize
import models
import data
import theano
import tornado.ioloop
@ottokart
ottokart / word2vec-binary-to-text.py
Last active February 1, 2021 17:25
Python script to convert word2vec pre-trained word embeddings from a binary format into a text format where each line starts with a word followed by corresponding embedding vector entries separated by spaces. E.g., "dog 0.41231234567890 0.355122341578123 ..."
# coding: utf-8
from __future__ import division
import struct
import sys
import gzip
FILE_NAME = "GoogleNews-vectors-negative300.bin.gz" # outputs GoogleNews-vectors-negative300.bin.gz.txt
MAX_VECTORS = 100000 # Top words to take
FLOAT_SIZE = 4 # 32bit float
@ottokart
ottokart / som.py
Created December 12, 2016 13:58
Simple Self-Organizing Map
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1)
# Replace X with your own data if you wish
##########################################
N = 300 # num samples
# coding: utf-8
# A little demo illustrating the effect of momentum in neural network training.
# Try using different values for MOMENTUM constant below (e.g. compare 0.0 with 0.9).
# This neural network is actually more like logistic regression, but I have used
# squared error to make the error surface more interesting.
import numpy as np
import pylab
@ottokart
ottokart / nn.py
Last active August 27, 2021 05:52
3-layer neural network example with dropout in 2nd layer
# Tiny example of 3-layer nerual network with dropout in 2nd hidden layer
# Output layer is linear with L2 cost (regression model)
# Hidden layer activation is tanh
import numpy as np
n_epochs = 100
n_samples = 100
n_in = 10
n_hidden = 5
@ottokart
ottokart / word2vec-binary-to-python-dict.py
Last active July 25, 2019 22:41
Python script to convert a binary file containing word2vec pre-trained word embeddings into a pickled python dict.
# coding: utf-8
from __future__ import division
import struct
import sys
FILE_NAME = "GoogleNews-vectors-negative300.bin"
MAX_VECTORS = 200000 # This script takes a lot of RAM (>2GB for 200K vectors), if you want to use the full 3M embeddings then you probably need to insert the vectors into some kind of database
FLOAT_SIZE = 4 # 32bit float