Aneesh Joshi aneesh-joshi

## word2vec_tftut.py
import tensorflow as tf
import numpy as np

corpus_raw = 'He is the king . The king is royal . She is the royal  queen '

# convert to lower case
corpus_raw = corpus_raw.lower()

words = []
for word in corpus_raw.split():

## LSTM_POS_Tagger_notes.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                aneesh-joshi
                / LSTM_POS_Tagger_notes.md
            
            
              Created
              March 8, 2018 05:55
            
          
    Notes on LSTM POS Tagger Shapes

X : numpy array of shape (No. of sample, Padding Length) 
						  Example : 64, 1000

						  [ [0, 0, ...., 52, 16, 23],
						    [0, 0, ...., 23, 64, 12]]
						   ^ this has shape (2, 1000)  since padding length is 1000
 it corresponds to sentences


## 17 May, 2018, SL Discussion.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                aneesh-joshi
                / 17 May, 2018, SL Discussion.md
            
            
              Last active
              May 18, 2018 07:10
            
          
    17 May, 2018 Discussion:

Predecided Objectives:


Come up with a way of evaluating models (in the form of a script)
Look for more data sets to evaluate models

Datasets:


WikiQA : [Ranking/Regression]
QuoraQP [Binary Classification]
The Stanford Natural Language Inference (SNLI) Corpus [Multi Class Classification]


## eval_w2v_avg.py
from gensim.similarity_learning import WikiQAExtractor

wikiqa = WikiQAExtractor(os.path.join("..", "data", "WikiQACorpus", "WikiQA-train.tsv"))
data = wikiqa.get_data()

# Below commented code is for making a dict for word vectors and pickling it
# w2v = {}

# with open('glove.6B.50d.txt') as f:
# 	for line in f:

## BenchMarks.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                aneesh-joshi
                / BenchMarks.md
            
            
              Created
              May 27, 2018 13:31
            
          
    MZ : the Match Zoo evaluation run on my machine
Mine: my evaluation script run on my machine
ANMM

MZ:
map=0.610744
ndcg@1=0.459916
ndcg@3=0.603051


## Reproducing Benchmarks.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                aneesh-joshi
                / Reproducing Benchmarks.md
            
            
              Created
              June 4, 2018 14:29
            
          
    How to reproduce your benchmark

This document will explain the newly introduced files, how they are to be used and how to reproduce my benchmarks.
Additional dependencies:

Unfortunately, the current state of the code needs the additional dependency of pandas, a module for hadnling .csv, .tsv, etc.
I was using it for grouping the datapoints by the document id. There are ways to do it without it and will be pushed soon.
So, you will have to install pandas first by running the command:
pip install pandas

  
## eval_script.py
import sys
import os
sys.path.append(os.path.join('..'))

import csv
import re
import gensim.downloader as api
from gensim.utils import simple_preprocess
import numpy as np

## Current Scenario.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                aneesh-joshi
                / Current Scenario.md
            
            
              Last active
              July 12, 2018 14:40
            
          
WikiQA test set
w2v 300 dim
MP
FT 300 dim
DRMM_TKS
biMPM


map
0.6277
0.6515
0.5276
0.6259
0.3856


gm_map
0.4968
0.5147
0.3923
0.4966
0.269


Rprec
0.4667
0.5089
0.3429
0.4613
0.1965


Current Situation
For the task of similarity learning, we are evaluating on the WikiQA Dataset

  
## QA Transfer.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                aneesh-joshi
                / QA Transfer.md
            
            
              Created
              July 19, 2018 14:59
            
          
    Resources on QA-Transfer Model

QA-Transfer Model uses:

SQUAD-T dataset
BiDAF model (with end layers changed)

BiDAF moedel has 3 open source implementations:

AllenAI-keras
Original-BiDAF-tf-0.11 / Original-QA-Transfer
PyTorch


## QA Transfer.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                aneesh-joshi
                / QA Transfer.md
            
            
              Last active
              July 19, 2018 15:02
            
          
    Resources on QA-Transfer Model

QA-Transfer Model uses:

SQUAD-T dataset
BiDAF model (with end layers changed)

BiDAF moedel has 3 open source implementations:

AllenAI-keras
Original-BiDAF-tf-0.11 and Original-QA-Transfer-tf-0.11 (QA-Transfer essentially forks the first repo and makes some changes to it.)
PyTorch
	import tensorflow as tf
	import numpy as np

	corpus_raw = 'He is the king . The king is royal . She is the royal queen '

	# convert to lower case
	corpus_raw = corpus_raw.lower()

	words = []
	for word in corpus_raw.split():
	from gensim.similarity_learning import WikiQAExtractor

	wikiqa = WikiQAExtractor(os.path.join("..", "data", "WikiQACorpus", "WikiQA-train.tsv"))
	data = wikiqa.get_data()

	# Below commented code is for making a dict for word vectors and pickling it
	# w2v = {}

	# with open('glove.6B.50d.txt') as f:
	# for line in f:
	import sys
	import os
	sys.path.append(os.path.join('..'))

	import csv
	import re
	import gensim.downloader as api
	from gensim.utils import simple_preprocess
	import numpy as np
WikiQA test set	w2v 300 dim	MP	FT 300 dim	DRMM_TKS	biMPM
map	0.6277	0.6515	0.5276	0.6259	0.3856
gm_map	0.4968	0.5147	0.3923	0.4966	0.269
Rprec	0.4667	0.5089	0.3429	0.4613	0.1965