Skip to content

Instantly share code, notes, and snippets.

@tokestermw
Last active May 30, 2020 08:29
Show Gist options
  • Save tokestermw/912042a85a1d53169c2dc7253dca55f6 to your computer and use it in GitHub Desktop.
Save tokestermw/912042a85a1d53169c2dc7253dca55f6 to your computer and use it in GitHub Desktop.
Simple example of Bidirectional RNN Language Model in PyTorch. (blog post: https://medium.com/@plusepsilon/the-bidirectional-language-model-1f3961d1fb27)
import torch, torch.nn as nn
from torch.autograd import Variable
text = ['BOS', 'How', 'are', 'you', 'EOS']
seq_len = len(text)
batch_size = 1
embedding_size = 1
hidden_size = 1
output_size = 1
random_input = Variable(
torch.FloatTensor(seq_len, batch_size, embedding_size).normal_(), requires_grad=False)
bi_rnn = torch.nn.RNN(
input_size=embedding_size, hidden_size=hidden_size, num_layers=1, batch_first=False, bidirectional=True)
bi_output, bi_hidden = bi_rnn(random_input)
# stagger
forward_output, backward_output = bi_output[:-2, :, :hidden_size], bi_output[2:, :, hidden_size:]
staggered_output = torch.cat((forward_output, backward_output), dim=-1)
linear = nn.Linear(hidden_size * 2, output_size)
# only predict on words
labels = random_input[1:-1]
# for language models, use cross-entropy :)
loss = nn.MSELoss()
output = loss(linear(staggered_output), labels)
@pruksmhc
Copy link

pruksmhc commented Jul 24, 2019

Why do you stagger by 2 and not 1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment