Skip to content

Instantly share code, notes, and snippets.

View geovedi's full-sized avatar

jim geovedi geovedi

View GitHub Profile
@odashi
odashi / mert.py
Last active May 1, 2016 14:17
Minimum error-rate training for statistical machine translation
#!/usr/bin/python3
import math
import random
import sys
from argparse import ArgumentParser
from collections import defaultdict
from util.functions import trace
def parse_args():
from numpy.random import choice as random_choice, randint as random_randint, rand
MAX_INPUT_LEN = 40
AMOUNT_OF_NOISE = 0.2 / MAX_INPUT_LEN
CHARS = list("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ .")
def add_noise_to_string(a_string, amount_of_noise):
"""Add some artificial spelling mistakes to the string"""
if rand() < amount_of_noise * len(a_string):
# Replace a character with a random character
random_char_position = random_randint(len(a_string))
@basaundi
basaundi / multi_bleu.py
Last active September 20, 2020 07:28
python rewrite of Moses' multi-bleu.perl; usable as a library
#!/usr/bin/env python
# Ander Martinez Sanchez
from __future__ import division, print_function
from math import exp, log
from collections import Counter
def ngram_count(words, n):
if n <= len(words):
@mbollmann
mbollmann / state_transfer_lstm.py
Created June 18, 2016 08:59
StateTransferLSTM for Keras 1.x
# Source:
# https://github.com/farizrahman4u/seq2seq/blob/master/seq2seq/layers/state_transfer_lstm.py
from keras import backend as K
from keras.layers.recurrent import LSTM
class StateTransferLSTM(LSTM):
"""LSTM with the ability to transfer its hidden state.
This layer behaves just like an LSTM, except that it can transfer (or
@quadrismegistus
quadrismegistus / gensim_word2vec_procrustes_align.py
Last active November 16, 2023 01:57
Code for aligning two gensim word2vec models using Procrustes matrix alignment. Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <wleif@stanford.edu>. [NOTE: This code is DEPRECATED for latest versions of gensim. Please see instead this updated version of the code <https://gist.github.com/zhicongchen/9e23…
def smart_procrustes_align_gensim(base_embed, other_embed, words=None):
"""Procrustes align two gensim word2vec models (to allow for comparison between same word across models).
Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <wleif@stanford.edu>.
(With help from William. Thank you!)
First, intersect the vocabularies (see `intersection_align_gensim` documentation).
Then do the alignment on the other_embed model.
Replace the other_embed model's syn0 and syn0norm numpy matrices with the aligned version.
Return other_embed.