Skip to content

Instantly share code, notes, and snippets.

import tensorflow as tf
import numpy as np
corpus_raw = 'He is the king . The king is royal . She is the royal queen '
# convert to lower case
corpus_raw = corpus_raw.lower()
words = []
for word in corpus_raw.split():

Notes on LSTM POS Tagger Shapes

X : numpy array of shape (No. of sample, Padding Length) 
						  Example : 64, 1000

						  [ [0, 0, ...., 52, 16, 23],
						    [0, 0, ...., 23, 64, 12]]
						   ^ this has shape (2, 1000)  since padding length is 1000
 it corresponds to sentences

17 May, 2018 Discussion:

Predecided Objectives:

  • Come up with a way of evaluating models (in the form of a script)
  • Look for more data sets to evaluate models

Datasets:

  • WikiQA : [Ranking/Regression]
  • QuoraQP [Binary Classification]
  • The Stanford Natural Language Inference (SNLI) Corpus [Multi Class Classification]
from gensim.similarity_learning import WikiQAExtractor
wikiqa = WikiQAExtractor(os.path.join("..", "data", "WikiQACorpus", "WikiQA-train.tsv"))
data = wikiqa.get_data()
# Below commented code is for making a dict for word vectors and pickling it
# w2v = {}
# with open('glove.6B.50d.txt') as f:
# for line in f:

MZ : the Match Zoo evaluation run on my machine Mine: my evaluation script run on my machine

ANMM

MZ:
map=0.610744
ndcg@1=0.459916
ndcg@3=0.603051

How to reproduce your benchmark

This document will explain the newly introduced files, how they are to be used and how to reproduce my benchmarks.

Additional dependencies:

Unfortunately, the current state of the code needs the additional dependency of pandas, a module for hadnling .csv, .tsv, etc. I was using it for grouping the datapoints by the document id. There are ways to do it without it and will be pushed soon.

So, you will have to install pandas first by running the command: pip install pandas

import sys
import os
sys.path.append(os.path.join('..'))
import csv
import re
import gensim.downloader as api
from gensim.utils import simple_preprocess
import numpy as np
WikiQA test set w2v 300 dim MP FT 300 dim DRMM_TKS biMPM
map 0.6277 0.6515 0.5276 0.6259 0.3856
gm_map 0.4968 0.5147 0.3923 0.4966 0.269
Rprec 0.4667 0.5089 0.3429 0.4613 0.1965

Current Situation For the task of similarity learning, we are evaluating on the WikiQA Dataset