Skip to content

Instantly share code, notes, and snippets.

View geovedi's full-sized avatar

jim geovedi geovedi

View GitHub Profile
@quadrismegistus
quadrismegistus / gensim_word2vec_procrustes_align.py
Last active November 16, 2023 01:57
Code for aligning two gensim word2vec models using Procrustes matrix alignment. Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <wleif@stanford.edu>. [NOTE: This code is DEPRECATED for latest versions of gensim. Please see instead this updated version of the code <https://gist.github.com/zhicongchen/9e23…
def smart_procrustes_align_gensim(base_embed, other_embed, words=None):
"""Procrustes align two gensim word2vec models (to allow for comparison between same word across models).
Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <wleif@stanford.edu>.
(With help from William. Thank you!)
First, intersect the vocabularies (see `intersection_align_gensim` documentation).
Then do the alignment on the other_embed model.
Replace the other_embed model's syn0 and syn0norm numpy matrices with the aligned version.
Return other_embed.
@mbollmann
mbollmann / state_transfer_lstm.py
Created June 18, 2016 08:59
StateTransferLSTM for Keras 1.x
# Source:
# https://github.com/farizrahman4u/seq2seq/blob/master/seq2seq/layers/state_transfer_lstm.py
from keras import backend as K
from keras.layers.recurrent import LSTM
class StateTransferLSTM(LSTM):
"""LSTM with the ability to transfer its hidden state.
This layer behaves just like an LSTM, except that it can transfer (or
@basaundi
basaundi / multi_bleu.py
Last active September 20, 2020 07:28
python rewrite of Moses' multi-bleu.perl; usable as a library
#!/usr/bin/env python
# Ander Martinez Sanchez
from __future__ import division, print_function
from math import exp, log
from collections import Counter
def ngram_count(words, n):
if n <= len(words):
from numpy.random import choice as random_choice, randint as random_randint, rand
MAX_INPUT_LEN = 40
AMOUNT_OF_NOISE = 0.2 / MAX_INPUT_LEN
CHARS = list("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ .")
def add_noise_to_string(a_string, amount_of_noise):
"""Add some artificial spelling mistakes to the string"""
if rand() < amount_of_noise * len(a_string):
# Replace a character with a random character
random_char_position = random_randint(len(a_string))
@odashi
odashi / mert.py
Last active May 1, 2016 14:17
Minimum error-rate training for statistical machine translation
#!/usr/bin/python3
import math
import random
import sys
from argparse import ArgumentParser
from collections import defaultdict
from util.functions import trace
def parse_args():
@odashi
odashi / bleu.py
Last active September 20, 2019 06:46
BLEU calculator
# usage (single sentence):
# ref = ['This', 'is', 'a', 'pen', '.']
# hyp = ['There', 'is', 'a', 'pen', '.']
# stats = get_bleu_stats(ref, hyp)
# bleu = calculate_bleu(stats) # => 0.668740
#
# usage (multiple sentences):
# stats = defaultdict(int)
# for ref, hyp in zip(refs, hyps):
# for k, v in get_bleu_stats(ref, hyp).items():
@entron
entron / imdb_cnn_kim_small_embedding.py
Last active September 16, 2023 16:23
Keras implementation of Kim's paper "Convolutional Neural Networks for Sentence Classification" with a very small embedding size. The test accuracy is 0.853.
'''This scripts implements Kim's paper "Convolutional Neural Networks for Sentence Classification"
with a very small embedding size (20) than the commonly used values (100 - 300) as it gives better
result with much less parameters.
Run on GPU: THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python imdb_cnn.py
Get to 0.853 test accuracy after 5 epochs. 13s/epoch on Nvidia GTX980 GPU.
'''
from __future__ import print_function
@odashi
odashi / chainer_encoder_decoder.py
Last active January 22, 2021 14:03
Training and generation processes for neural encoder-decoder machine translation.
#!/usr/bin/python3
import datetime
import sys
import math
import numpy as np
from argparse import ArgumentParser
from collections import defaultdict
from chainer import FunctionSet, Variable, functions, optimizers
@kachayev
kachayev / concurrency-in-go.md
Last active March 11, 2024 11:27
Channels Are Not Enough or Why Pipelining Is Not That Easy
@syllog1sm
syllog1sm / gist:10343947
Last active November 7, 2023 13:09
A simple Python dependency parser
"""A simple implementation of a greedy transition-based parser. Released under BSD license."""
from os import path
import os
import sys
from collections import defaultdict
import random
import time
import pickle
SHIFT = 0; RIGHT = 1; LEFT = 2;