Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
How to use pad_packed_sequence in pytorch<1.1.0
import torch
import torch.nn as nn
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
seqs = ['gigantic_string','tiny_str','medium_str']
# make <pad> idx 0
vocab = ['<pad>'] + sorted(set(''.join(seqs)))
# make model
embed = nn.Embedding(len(vocab), 10).cuda()
lstm = nn.LSTM(10, 5).cuda()
vectorized_seqs = [[vocab.index(tok) for tok in seq] for seq in seqs]
# get the length of each seq in your batch
seq_lengths = torch.LongTensor([len(seq) for seq in vectorized_seqs]).cuda()
# dump padding everywhere, and place seqs on the left.
# NOTE: you only need a tensor as big as your longest sequence
seq_tensor = torch.zeros((len(vectorized_seqs), seq_lengths.max())).long().cuda()
for idx, (seq, seqlen) in enumerate(zip(vectorized_seqs, seq_lengths)):
seq_tensor[idx, :seqlen] = torch.LongTensor(seq)
# SORT YOUR TENSORS BY LENGTH!
seq_lengths, perm_idx = seq_lengths.sort(0, descending=True)
seq_tensor = seq_tensor[perm_idx]
# utils.rnn lets you give (B,L,D) tensors where B is the batch size, L is the maxlength, if you use batch_first=True
# Otherwise, give (L,B,D) tensors
seq_tensor = seq_tensor.transpose(0,1) # (B,L,D) -> (L,B,D)
# embed your sequences
seq_tensor = embed(seq_tensor)
# pack them up nicely
packed_input = pack_padded_sequence(seq_tensor, seq_lengths.cpu().numpy())
# throw them through your LSTM (remember to give batch_first=True here if you packed with it)
packed_output, (ht, ct) = lstm(packed_input)
# unpack your output if required
output, _ = pad_packed_sequence(packed_output)
print (output)
# Or if you just want the final hidden state?
print (ht[-1])
# REMEMBER: Your outputs are sorted. If you want the original ordering
# back (to compare to some gt labels) unsort them
_, unperm_idx = perm_idx.sort(0)
output = output[unperm_idx]
print (output)
@yuchenlin

This comment has been minimized.

Copy link

@yuchenlin yuchenlin commented Jul 4, 2017

great code!
but i wonder why pytorch does not just design a util function which just receives a fixed-length list of variable-length sequences and output a padded&packed variable...
it's so complicated now, though

@lgalke

This comment has been minimized.

Copy link

@lgalke lgalke commented Aug 31, 2017

Great demo code! So you don't need to bother with padding_idx of Embedding to ignore the zeros, because the packing does not even show them to the lstm?

@ngarneau

This comment has been minimized.

Copy link

@ngarneau ngarneau commented Oct 31, 2017

@Tushar-N thanks for this gist, it's awesome. In fact it inspired myself to write this little demo with some visualizations for a better understanding on batching inputs into a LSTM where I featured your code. Many thanks :) Cheers.

@hunkim

This comment has been minimized.

Copy link

@hunkim hunkim commented Nov 2, 2017

Very cool!

@Cheneng

This comment has been minimized.

Copy link

@Cheneng Cheneng commented Dec 23, 2017

Help a lot! Thanks!

@donghyeonk

This comment has been minimized.

Copy link

@donghyeonk donghyeonk commented Dec 28, 2017

Thanks!

@Tushar-N

This comment has been minimized.

Copy link
Owner Author

@Tushar-N Tushar-N commented Feb 14, 2018

Huh, github never notified me about comments on the gist. Well, better late than never.
@lgalke That's right, you don't have to worry about padding_idx
@ngarneau nice demo! And everyone else, glad I could help :)

@nikhiltitus

This comment has been minimized.

Copy link

@nikhiltitus nikhiltitus commented Feb 25, 2018

Can we feed (L,B,D) dimension to embedding layer? The docs say the first dimension should be mini batch size.

@Tushar-N

This comment has been minimized.

Copy link
Owner Author

@Tushar-N Tushar-N commented Mar 15, 2018

@nikhiltitus You can. Embedding expects a (N,W) tensor, but it pulls out an embedding for each element anyway.

@datduong

This comment has been minimized.

Copy link

@datduong datduong commented Mar 24, 2018

Hi, I don't understand this part,

# throw them through your LSTM (remember to give batch_first=True here if you packed with it)
packed_output, (ht, ct) = lstm(packed_input)

I used packed_input = pack_padded_sequence(seq_tensor, seq_lengths.numpy() , batch_first=True ), then I tried packed_output, (ht, ct) = lstm(packed_input,batch_first=True) and get

TypeError: forward() got an unexpected keyword argument 'batch_first'

Thanks.

@helson73

This comment has been minimized.

Copy link

@helson73 helson73 commented Mar 25, 2018

@datduong
batch_first argument is only for initialization of LSTM, forward() doesn't need that.

@DuaneNielsen

This comment has been minimized.

Copy link

@DuaneNielsen DuaneNielsen commented Apr 8, 2018

I ran this up and got the following error in python 3...

TypeError: torch.cuda.LongTensor constructor received an invalid combination of arguments - got (map), but expected one of...

The fix was to change line 24

seq_lengths = torch.cuda.LongTensor(map(len, vectorized_seqs))

to

seq_lengths = torch.cuda.LongTensor(list(map(len, vectorized_seqs)))

Guess they messed with the way maps work

@jojonki

This comment has been minimized.

Copy link

@jojonki jojonki commented Apr 16, 2018

seq_lengths = torch.LongTensor([len(seq) for seq in vectorized_seqs]) also works

@jizg

This comment has been minimized.

Copy link

@jizg jizg commented May 16, 2018

Great demo, very helpful. I also used this way in my work. Thanks

@b03902130

This comment has been minimized.

Copy link

@b03902130 b03902130 commented May 19, 2018

It is really helpful!! Thanks very much!!

@aerinkim

This comment has been minimized.

Copy link

@aerinkim aerinkim commented Jun 25, 2018

Thank you!

@HarshTrivedi

This comment has been minimized.

Copy link

@HarshTrivedi HarshTrivedi commented Jul 30, 2018

@Tushar-N Wonderful! A minimal example explaining everything, thanks! Here (and here) is a much verbose version of this. I think, ascii drawings would make it much simpler to visualize and understand what's happening inside.

@allanj

This comment has been minimized.

Copy link

@allanj allanj commented Aug 14, 2018

Great understanding

Agree that line 24 should be changed

@chaitanya100100

This comment has been minimized.

Copy link

@chaitanya100100 chaitanya100100 commented Aug 29, 2018

@ngarneau Your demo was really helpful. Thank you very much !!

@icesuns

This comment has been minimized.

Copy link

@icesuns icesuns commented Nov 21, 2018

# REMEMBER: Your outputs are sorted. If you want the original ordering
# back (to compare to some gt labels) unsort them
_, unperm_idx = perm_idx.sort(0)
output = output[unperm_idx]
print (output)

if you want to get the original ordering, you should add script "output = output.transpose(1, 0)"
otherwise, the index will be out bounds of dimenssion of outout

@DarryO

This comment has been minimized.

Copy link

@DarryO DarryO commented Jan 9, 2019

Since the perm_idx is obtained by lengths, should we use the following code to do reverse?

output = output.transpose(0, 1)  # L x B x D -> B x L x D
hidden = hidden.transpose(0, 1)
output = output[unperm_idx]
hidden = hidden[unperm_idx]
@tang1943

This comment has been minimized.

Copy link

@tang1943 tang1943 commented Apr 8, 2019

Just like @DarryO and @icesuns said, if you want the original ordering, transpose output first.

@MikulasZelinka

This comment has been minimized.

Copy link

@MikulasZelinka MikulasZelinka commented May 7, 2019

Since pytorch 1.1.0, sorting the sequences by their lengths is no longer needed: pytorch/pytorch#15225.

As an exercise, I tried to replicate this and the version by @HarshTrivedi, maybe it would be useful to someone (although I recommend the two mentioned above more): https://gist.github.com/MikulasZelinka/9fce4ed47ae74fca454e88a39f8d911a (also includes a very basic Dataset and DataLoader example).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment