Skip to content

Instantly share code, notes, and snippets.

from gensim.similarity_learning import WikiQAExtractor
wikiqa = WikiQAExtractor(os.path.join("..", "data", "WikiQACorpus", "WikiQA-train.tsv"))
data = wikiqa.get_data()
# Below commented code is for making a dict for word vectors and pickling it
# w2v = {}
# with open('glove.6B.50d.txt') as f:
# for line in f:

17 May, 2018 Discussion:

Predecided Objectives:

  • Come up with a way of evaluating models (in the form of a script)
  • Look for more data sets to evaluate models

Datasets:

  • WikiQA : [Ranking/Regression]
  • QuoraQP [Binary Classification]
  • The Stanford Natural Language Inference (SNLI) Corpus [Multi Class Classification]

Notes on LSTM POS Tagger Shapes

X : numpy array of shape (No. of sample, Padding Length) 
						  Example : 64, 1000

						  [ [0, 0, ...., 52, 16, 23],
						    [0, 0, ...., 23, 64, 12]]
						   ^ this has shape (2, 1000)  since padding length is 1000
 it corresponds to sentences
import tensorflow as tf
import numpy as np
corpus_raw = 'He is the king . The king is royal . She is the royal queen '
# convert to lower case
corpus_raw = corpus_raw.lower()
words = []
for word in corpus_raw.split():