Skip to content

Instantly share code, notes, and snippets.

Notes on LSTM POS Tagger Shapes

X : numpy array of shape (No. of sample, Padding Length) 
						  Example : 64, 1000

						  [ [0, 0, ...., 52, 16, 23],
						    [0, 0, ...., 23, 64, 12]]
						   ^ this has shape (2, 1000)  since padding length is 1000
 it corresponds to sentences

17 May, 2018 Discussion:

Predecided Objectives:

  • Come up with a way of evaluating models (in the form of a script)
  • Look for more data sets to evaluate models

Datasets:

  • WikiQA : [Ranking/Regression]
  • QuoraQP [Binary Classification]
  • The Stanford Natural Language Inference (SNLI) Corpus [Multi Class Classification]
from gensim.similarity_learning import WikiQAExtractor
wikiqa = WikiQAExtractor(os.path.join("..", "data", "WikiQACorpus", "WikiQA-train.tsv"))
data = wikiqa.get_data()
# Below commented code is for making a dict for word vectors and pickling it
# w2v = {}
# with open('glove.6B.50d.txt') as f:
# for line in f:

MZ : the Match Zoo evaluation run on my machine Mine: my evaluation script run on my machine

ANMM

MZ:
map=0.610744
ndcg@1=0.459916
ndcg@3=0.603051

How to reproduce your benchmark

This document will explain the newly introduced files, how they are to be used and how to reproduce my benchmarks.

Additional dependencies:

Unfortunately, the current state of the code needs the additional dependency of pandas, a module for hadnling .csv, .tsv, etc. I was using it for grouping the datapoints by the document id. There are ways to do it without it and will be pushed soon.

So, you will have to install pandas first by running the command: pip install pandas

import sys
import os
sys.path.append(os.path.join('..'))
import csv
import re
import gensim.downloader as api
from gensim.utils import simple_preprocess
import numpy as np
WikiQA test set w2v 300 dim MP FT 300 dim DRMM_TKS biMPM
map 0.6277 0.6515 0.5276 0.6259 0.3856
gm_map 0.4968 0.5147 0.3923 0.4966 0.269
Rprec 0.4667 0.5089 0.3429 0.4613 0.1965

Current Situation For the task of similarity learning, we are evaluating on the WikiQA Dataset

@aneesh-joshi
aneesh-joshi / Final Report.md
Last active August 5, 2018 22:33
Final Report

Similarity Learning using Neural Networks

Index

  1. Problem Statement
  2. Similarity Learning Tasks
  3. Evaluation Metrics
  4. Establishing Baselines
  5. About Datasets
  6. The journey
  7. Notes on Finetuning Models