Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Keras Layer that implements an Attention mechanism, with a context/query vector, for temporal data. Supports Masking. Follows the work of Yang et al. [https://www.cs.cmu.edu/~diyiy/docs/naacl16.pdf] "Hierarchical Attention Networks for Document Classification"
def dot_product(x, kernel):
"""
Wrapper for dot product operation, in order to be compatible with both
Theano and Tensorflow
Args:
x (): input
kernel (): weights
Returns:
"""
if K.backend() == 'tensorflow':
return K.squeeze(K.dot(x, K.expand_dims(kernel)), axis=-1)
else:
return K.dot(x, kernel)
class AttentionWithContext(Layer):
"""
Attention operation, with a context/query vector, for temporal data.
Supports Masking.
Follows the work of Yang et al. [https://www.cs.cmu.edu/~diyiy/docs/naacl16.pdf]
"Hierarchical Attention Networks for Document Classification"
by using a context vector to assist the attention
# Input shape
3D tensor with shape: `(samples, steps, features)`.
# Output shape
2D tensor with shape: `(samples, features)`.
How to use:
Just put it on top of an RNN Layer (GRU/LSTM/SimpleRNN) with return_sequences=True.
The dimensions are inferred based on the output shape of the RNN.
Note: The layer has been tested with Keras 2.0.6
Example:
model.add(LSTM(64, return_sequences=True))
model.add(AttentionWithContext())
# next add a Dense layer (for classification/regression) or whatever...
"""
def __init__(self,
W_regularizer=None, u_regularizer=None, b_regularizer=None,
W_constraint=None, u_constraint=None, b_constraint=None,
bias=True, **kwargs):
self.supports_masking = True
self.init = initializers.get('glorot_uniform')
self.W_regularizer = regularizers.get(W_regularizer)
self.u_regularizer = regularizers.get(u_regularizer)
self.b_regularizer = regularizers.get(b_regularizer)
self.W_constraint = constraints.get(W_constraint)
self.u_constraint = constraints.get(u_constraint)
self.b_constraint = constraints.get(b_constraint)
self.bias = bias
super(AttentionWithContext, self).__init__(**kwargs)
def build(self, input_shape):
assert len(input_shape) == 3
self.W = self.add_weight((input_shape[-1], input_shape[-1],),
initializer=self.init,
name='{}_W'.format(self.name),
regularizer=self.W_regularizer,
constraint=self.W_constraint)
if self.bias:
self.b = self.add_weight((input_shape[-1],),
initializer='zero',
name='{}_b'.format(self.name),
regularizer=self.b_regularizer,
constraint=self.b_constraint)
self.u = self.add_weight((input_shape[-1],),
initializer=self.init,
name='{}_u'.format(self.name),
regularizer=self.u_regularizer,
constraint=self.u_constraint)
super(AttentionWithContext, self).build(input_shape)
def compute_mask(self, input, input_mask=None):
# do not pass the mask to the next layers
return None
def call(self, x, mask=None):
uit = dot_product(x, self.W)
if self.bias:
uit += self.b
uit = K.tanh(uit)
ait = K.dot(uit, self.u)
a = K.exp(ait)
# apply mask after the exp. will be re-normalized next
if mask is not None:
# Cast the mask to floatX to avoid float64 upcasting in theano
a *= K.cast(mask, K.floatx())
# in some cases especially in the early stages of training the sum may be almost zero
# and this results in NaN's. A workaround is to add a very small positive number ε to the sum.
# a /= K.cast(K.sum(a, axis=1, keepdims=True), K.floatx())
a /= K.cast(K.sum(a, axis=1, keepdims=True) + K.epsilon(), K.floatx())
a = K.expand_dims(a)
weighted_input = x * a
return K.sum(weighted_input, axis=1)
def compute_output_shape(self, input_shape):
return input_shape[0], input_shape[-1]
@aryopg

This comment has been minimized.

Copy link

@aryopg aryopg commented Mar 15, 2017

Thank you so much for the code, i've tried the code but i got an error about input dimension in the dense layer. Here's my code:

model.add(LSTM(100, return_sequences=True))
model.add(AttentionWithContext())
model.add(Dense(2, input_dim=2, activation='softmax'))

I'm using tensorflow as the backend, and i've fix the code using your suggestion in the keras issue #4962. but i'm still getting the error. Here's the output and layer diagram:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_1 (Embedding)      (None, 500, 32)           160000    
_________________________________________________________________
lstm_1 (LSTM)                (None, 500, 100)          53200     
_________________________________________________________________
attention_with_context_1 (At (None, 500, 100)          10200     
_________________________________________________________________
dense_1 (Dense)              (None, 500, 2)            202       
=================================================================

And here's the error :

ValueError: Error when checking model target: expected dense_1 to have 3 dimensions, but got array with shape (25000, 2)

I'm using the imdb sentiment analysis data for this by the way.

@nigeljyng

This comment has been minimized.

Copy link

@nigeljyng nigeljyng commented Apr 6, 2017

I'm having the same problem actually. I've narrowed the problem down to a specific cause. For some reason this layer gives the wrong output shape. It should be (None, 100), but as you can see in your table (row 3), the output shape is (None, 500, 100). Will see if I can fix it.

EDIT: I found the issue. Custom layers need a compute_output_shape method if the layer modifies the input shape. See my fork for the fix.

@ni9elf

This comment has been minimized.

Copy link

@ni9elf ni9elf commented May 13, 2017

Thank you for releasing your code. Have you implemented the entire Hierarchical Attention Network (HAN) also, apart from the above attention layer? Any leads on how to get the code of HAN, preferably in Keras. I have currently found these two Keras implementations: https://github.com/richliao/textClassifier and https://github.com/synthesio/hierarchical-attention-networks/blob/master/model.py

@ni9elf

This comment has been minimized.

Copy link

@ni9elf ni9elf commented May 13, 2017

Need some help in two lines.
In line 40, why is assert len(input_shape) == 3 required? What information is stored in input_shape?
In line 87, why is expand_dim being used?
Thanks.

@Helw150

This comment has been minimized.

Copy link

@Helw150 Helw150 commented Jun 19, 2017

@cbaziotis My loss is still NaN despite the small epsilon. Any recommended paths for debugging?

@cbaziotis

This comment has been minimized.

Copy link
Owner Author

@cbaziotis cbaziotis commented Jul 30, 2017

Sorry for not replying sooner, but notifications for gist comments apparently don't work.
isaacs/github#21

Regarding some of the errors: the layer was developed using Theano as a backend. I have updated the gist and now it also works with Tensorflow. However, i suggest to use Theano, as it has better RNN performance. Please use the new version and let me know.
Also, i have not tested the layer with Keras 2, but i assume it will need only some minor syntactic changes.

@Helw150 do you mind sharing the code for your model?

@shillel

This comment has been minimized.

Copy link

@shillel shillel commented Aug 9, 2017

Thank you for this! using it in my school project.
one comment: even with your fix in line:105, I still sometimes encountered the NaN issue.
following BiMPM, I used K.max(sum(...), K.epsilon()) which turned out to be more stable.
hope this helps.

@cbaziotis

This comment has been minimized.

Copy link
Owner Author

@cbaziotis cbaziotis commented Sep 19, 2017

Updated for Keras 2.

@sreiling

This comment has been minimized.

Copy link

@sreiling sreiling commented Sep 27, 2017

Line 93: I had to replace ait = K.dot(uit, self.u) with ait = dot_product(uit, self.u) to make it work with TF

@linetor

This comment has been minimized.

Copy link

@linetor linetor commented Oct 4, 2017

@sreiling If that, there is not error. But the model result is different. If I make that,AttentionWithContext's output dimension is lstm's hidden dim, compute_output_shape's output dim is input's last dim(embedding dim). Is it right?

@aryopg

This comment has been minimized.

Copy link

@aryopg aryopg commented Oct 26, 2017

Really great code @cbaziotis! I've used it several times for classification problem. But, I've been wondering how to use this in a seq2seq architecture? Many thanks!

@AdriMarteau

This comment has been minimized.

Copy link

@AdriMarteau AdriMarteau commented Nov 23, 2017

L93 was also creating an issue for me with TensorFlow so I reused the dot_product() function like on L87

@shanest

This comment has been minimized.

Copy link

@shanest shanest commented Jan 30, 2018

Thanks so much for this terrific gist (as well as your other Attention one)!

One minor bug: on line 93, K.dot should be replaced by dot_product, so that it works with TensorFlow as backend.

@LeZhengThu

This comment has been minimized.

Copy link

@LeZhengThu LeZhengThu commented Mar 7, 2018

Nice work. Thanks for sharing the code.
I have a problem when I'm using the code. My sequences have varying lengths and I’m using bucketing to solve the issue. Therefore I define the LSTM input shape as (None, None, features), i.e. there are no explicit timesteps. But your code needs a fixed timestep, so there always raises an error. As far as I know, the number of timesteps doesn't need to be fixed. Therefore I wonder if there's a way to modify the code to support that. Thanks.

@huhk-sysu

This comment has been minimized.

Copy link

@huhk-sysu huhk-sysu commented Mar 18, 2018

Thank you for your code.
I want to use the Layer(with some adaptive changes) in my code as part of my graduation thesis. I wonder how should I cite it.

@LuisPB7

This comment has been minimized.

Copy link

@LuisPB7 LuisPB7 commented Apr 7, 2018

Hello everyone

I was wondering, does anyone know how to create an attention layer with a custom (fixed, or trainable) context vector? I have tried this:

def call(self, inputs, mask=None):
        x = inputs[0]
        context = inputs[1]
        uit = K.dot(x, self.W)

        if self.bias:
            uit += self.b

        uit = K.tanh(uit)
        ait = K.dot(uit, context)

        a = K.exp(ait)

        # apply mask after the exp. will be re-normalized next
        if mask is not None:
            # Cast the mask to floatX to avoid float64 upcasting in theano
            a *= K.cast(mask, K.floatx())

        # in some cases especially in the early stages of training the sum may be almost zero
        # and this results in NaN's. A workaround is to add a very small positive number ε to the sum.
        # a /= K.cast(K.sum(a, axis=1, keepdims=True), K.floatx())
        a /= K.cast(K.sum(a, axis=1, keepdims=True) + K.epsilon(), K.floatx())

        a = K.expand_dims(a)
        weighted_input = x * a
        return K.sum(weighted_input, axis=1)

having also modified some other aspects such as the get_output_shape_for and compute_output_shape methods. Here is how I create and apply the layer with the custom context vector:

# Some other code...
context = Dropout(0.01)(dense) # 150 dimensional vector
...
H = TimeDistributed(Dense(150))(g2) # (None, 50, 150) tensor
sentence = AttentionWithContext()([H, context])
SentenceEncoder = Model(input_premisse, sent)

However, when attempting to run


input_premisse = Input(shape=(50,))
input_hyp = Input(shape=(50,))
input_overlap = Input(shape=(1,))
input_refuting = Input(shape=(15,))
input_polarity = Input(shape=(2,))
input_hand = Input(shape=(26,))
input_sim = Input(shape=(1,))
input_bleu = Input(shape=(1,))
input_rouge = Input(shape=(3,))
...
premisse_representation = SentenceEncoder(input_premisse)
hyp_representation = SentenceEncoder(input_hyp)
concat = merge([premisse_representation, hyp_representation], mode='concat')
mul = merge([premisse_representation, hyp_representation], mode='mul')
dif = merge([premisse_representation, hyp_representation], mode=lambda x: x[0] - x[1], output_shape=lambda x: x[0])
final_merge = merge([concat, mul, dif, input_overlap, input_refuting, input_polarity, input_hand, input_sim, input_bleu, input_rouge], mode='concat')

I get an error on the final_merge which says:

line 229, in <module>
  sent = AttentionWithContext()([H, context])
File "D:\Anaconda3\Lib\site-packages\keras\engine\topology.py", line 572, in __call__
  self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "D:\Anaconda3\Lib\site-packages\keras\engine\topology.py", line 635, in add_inbound_node
  Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "D:\Anaconda3\Lib\site-packages\keras\engine\topology.py", line 172, in create_node
  output_tensors = to_list(outbound_layer.call(input_tensors, mask=input_masks))
File "C:\Users\Luís Pedro\Desktop\my_layers.py", line 186, in call
  a *= K.cast(mask, K.floatx())
File "D:\Anaconda3\Lib\site-packages\keras\backend\theano_backend.py", line 206, in cast
  return T.cast(x, dtype)
File "D:\Anaconda3\Lib\site-packages\theano\tensor\basic.py", line 1259, in cast
  _x = as_tensor_variable(x)
File "D:\Anaconda3\Lib\site-packages\theano\tensor\basic.py", line 200, in as_tensor_variable
  raise AsTensorError("Cannot convert %s to TensorType" % str_x, type(x))

theano.tensor.var.AsTensorError: ('Cannot convert [None, None] to TensorType', <class 'list'>)

and if I comment out that specific lines, I instead get the error:

File "C:\Users\Luís Pedro\Desktop\generate-contexts.py", line 244, in <module>
  final_merge = merge([concat, mul, dif, input_overlap, input_refuting, input_polarity, input_hand, input_sim, input_bleu, input_rouge], mode='concat')
File "D:\Anaconda3\Lib\site-packages\keras\engine\topology.py", line 1680, in merge
  name=name)
File "D:\Anaconda3\Lib\site-packages\keras\engine\topology.py", line 1301, in __init__
  self.add_inbound_node(layers, node_indices, tensor_indices)
File "D:\Anaconda3\Lib\site-packages\keras\engine\topology.py", line 635, in add_inbound_node
  Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "D:\Anaconda3\Lib\site-packages\keras\engine\topology.py", line 172, in create_node
  output_tensors = to_list(outbound_layer.call(input_tensors, mask=input_masks))
File "D:\Anaconda3\Lib\site-packages\keras\engine\topology.py", line 1394, in call
  return K.concatenate(inputs, axis=self.concat_axis)
File "D:\Anaconda3\Lib\site-packages\keras\backend\theano_backend.py", line 583, in concatenate
  return T.concatenate([to_dense(x) for x in tensors], axis=axis)
File "D:\Anaconda3\Lib\site-packages\theano\tensor\basic.py", line 4734, in concatenate
  return join(axis, *tensor_list)
File "D:\Anaconda3\Lib\site-packages\theano\tensor\basic.py", line 4483, in join
  return join_(axis, *tensors_list)
File "D:\Anaconda3\Lib\site-packages\theano\gof\op.py", line 615, in __call__
  node = self.make_node(*inputs, **kwargs)
File "D:\Anaconda3\Lib\site-packages\theano\tensor\basic.py", line 4216, in make_node
  axis, tensors, as_tensor_variable_args, output_maker)
File "D:\Anaconda3\Lib\site-packages\theano\tensor\basic.py", line 4282, in _make_node_internal
  raise TypeError("Join() can only join tensors with the same "
builtins.TypeError: Join() can only join tensors with the same number of dimensions.

Any ideas? Thanks.

@leocnj

This comment has been minimized.

Copy link

@leocnj leocnj commented Apr 28, 2018

Just wondering whether
self.W = self.add_weight((input_shape[-1], input_shape[-1],)
is necessary. Can we simply use a vector here rather than a matrix?

@ronggong

This comment has been minimized.

Copy link

@ronggong ronggong commented May 30, 2018

https://gist.github.com/cbaziotis/7ef97ccf71cbc14366835198c09809d2#gistcomment-2373145

@LeZhengThu This code works for the variable length input I think. At least it works for my case.

@skywang329

This comment has been minimized.

Copy link

@skywang329 skywang329 commented Jun 25, 2018

I'm getting negative attention weights for some words using this. Is this supposed to happen? If so, any literature that indicates this should happen? If not, any ideas on how to fix?

@Lhemamou

This comment has been minimized.

Copy link

@Lhemamou Lhemamou commented Jul 2, 2018

@skywang329 Do you check a values or u values ? The attention weights are the a values and normally the exponential forces coefficient to be positive

@Tixierae

This comment has been minimized.

Copy link

@Tixierae Tixierae commented Jul 20, 2018

https://gist.github.com/cbaziotis/7ef97ccf71cbc14366835198c09809d2#gistcomment-2605022
@ronggong could you provide a minimal working example? I'm using bucketing like @LeZhengThu (for efficiency reasons), so I set input_length=None since it varies from batch to batch. The output of my Bidirectional GRU layer has shape (?,?,256). When adding an AttentionWithContext layer, I get IndexError: pop index out of range.

@Tixierae

This comment has been minimized.

Copy link

@Tixierae Tixierae commented Jul 20, 2018

I fixed the error I was getting by replacing K.dot by dot_product line 93. The error had nothing to do with the length of the input.

@lzfelix

This comment has been minimized.

Copy link

@lzfelix lzfelix commented Aug 3, 2018

Thanks for your implementation @cbaziotis! I have made some modifications on your code here in order to make it compatible with Keras 2.x and to also make easy recovering the attention weights for visualization. By the way, have you thought about making a PR for the attention layer on keras-contrib?

@ant1pink

This comment has been minimized.

Copy link

@ant1pink ant1pink commented Aug 16, 2018

inputs = Input(shape=(100,))
embedding_layer = Embedding(maxnumber_of_tp, embedding_vecor_length, mask_zero=True)(inputs)
hidden = LSTM(64, return_sequences=True)(embedding_laye )
sentence, word_scores = Attention(return_attention=True)(hidden)
output = Dense(1, activation='sigmoid')(sentence)
model = Model(input=inputs, output=output)

I train it with a binary classification problem. My question is How should I catch 'word_scores'?
When I do this:

attention_model = Model(input= model.input, output= model.layers[-2].output)

I got the 'sentence' rather than 'word_scores '

Anyone knows?

@nectario

This comment has been minimized.

Copy link

@nectario nectario commented Feb 3, 2019

Where is the context computed? I need to output a different sequence length than the one of the input.

@IS5882

This comment has been minimized.

Copy link

@IS5882 IS5882 commented Mar 2, 2019

The attention layer outputs a 2D tensor shape (none,256) any idea on how to make it output a 3D tensor without reshaping??!

Because I reshaped it to be (none,1,256) and my time distributed dense layers that follow expects (None, 1, 15) and I need it to expect what its actually receiving (none,20,15) since 20 is my max sentence length ?! Any ideas?

@iridiumblue

This comment has been minimized.

Copy link

@iridiumblue iridiumblue commented May 25, 2019

Great work, thanks!

I've made some small updates, so that the Layer works under Tensorflow 1.13 with Eager Execution (EE is awesome, with its imperative model, makes debugging soooooo much easier.)

AttentionWithContext for TF 1.13 and Eager Execution

@Saichethan

This comment has been minimized.

Copy link

@Saichethan Saichethan commented Jun 18, 2019

will this work for different modalities like (visual and texual)?

@gdbb

This comment has been minimized.

Copy link

@gdbb gdbb commented Aug 31, 2019

Hello everyone

I was wondering, does anyone know how to create an attention layer with a custom (fixed, or trainable) context vector? I have tried this:

def call(self, inputs, mask=None):
        x = inputs[0]
        context = inputs[1]
        uit = K.dot(x, self.W)

        if self.bias:
            uit += self.b

        uit = K.tanh(uit)
        ait = K.dot(uit, context)

        a = K.exp(ait)

        # apply mask after the exp. will be re-normalized next
        if mask is not None:
            # Cast the mask to floatX to avoid float64 upcasting in theano
            a *= K.cast(mask, K.floatx())

        # in some cases especially in the early stages of training the sum may be almost zero
        # and this results in NaN's. A workaround is to add a very small positive number ε to the sum.
        # a /= K.cast(K.sum(a, axis=1, keepdims=True), K.floatx())
        a /= K.cast(K.sum(a, axis=1, keepdims=True) + K.epsilon(), K.floatx())

        a = K.expand_dims(a)
        weighted_input = x * a
        return K.sum(weighted_input, axis=1)

having also modified some other aspects such as the get_output_shape_for and compute_output_shape methods. Here is how I create and apply the layer with the custom context vector:

# Some other code...
context = Dropout(0.01)(dense) # 150 dimensional vector
...
H = TimeDistributed(Dense(150))(g2) # (None, 50, 150) tensor
sentence = AttentionWithContext()([H, context])
SentenceEncoder = Model(input_premisse, sent)

However, when attempting to run


input_premisse = Input(shape=(50,))
input_hyp = Input(shape=(50,))
input_overlap = Input(shape=(1,))
input_refuting = Input(shape=(15,))
input_polarity = Input(shape=(2,))
input_hand = Input(shape=(26,))
input_sim = Input(shape=(1,))
input_bleu = Input(shape=(1,))
input_rouge = Input(shape=(3,))
...
premisse_representation = SentenceEncoder(input_premisse)
hyp_representation = SentenceEncoder(input_hyp)
concat = merge([premisse_representation, hyp_representation], mode='concat')
mul = merge([premisse_representation, hyp_representation], mode='mul')
dif = merge([premisse_representation, hyp_representation], mode=lambda x: x[0] - x[1], output_shape=lambda x: x[0])
final_merge = merge([concat, mul, dif, input_overlap, input_refuting, input_polarity, input_hand, input_sim, input_bleu, input_rouge], mode='concat')

I get an error on the final_merge which says:

line 229, in <module>
  sent = AttentionWithContext()([H, context])
File "D:\Anaconda3\Lib\site-packages\keras\engine\topology.py", line 572, in __call__
  self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "D:\Anaconda3\Lib\site-packages\keras\engine\topology.py", line 635, in add_inbound_node
  Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "D:\Anaconda3\Lib\site-packages\keras\engine\topology.py", line 172, in create_node
  output_tensors = to_list(outbound_layer.call(input_tensors, mask=input_masks))
File "C:\Users\Luís Pedro\Desktop\my_layers.py", line 186, in call
  a *= K.cast(mask, K.floatx())
File "D:\Anaconda3\Lib\site-packages\keras\backend\theano_backend.py", line 206, in cast
  return T.cast(x, dtype)
File "D:\Anaconda3\Lib\site-packages\theano\tensor\basic.py", line 1259, in cast
  _x = as_tensor_variable(x)
File "D:\Anaconda3\Lib\site-packages\theano\tensor\basic.py", line 200, in as_tensor_variable
  raise AsTensorError("Cannot convert %s to TensorType" % str_x, type(x))

theano.tensor.var.AsTensorError: ('Cannot convert [None, None] to TensorType', <class 'list'>)

and if I comment out that specific lines, I instead get the error:

File "C:\Users\Luís Pedro\Desktop\generate-contexts.py", line 244, in <module>
  final_merge = merge([concat, mul, dif, input_overlap, input_refuting, input_polarity, input_hand, input_sim, input_bleu, input_rouge], mode='concat')
File "D:\Anaconda3\Lib\site-packages\keras\engine\topology.py", line 1680, in merge
  name=name)
File "D:\Anaconda3\Lib\site-packages\keras\engine\topology.py", line 1301, in __init__
  self.add_inbound_node(layers, node_indices, tensor_indices)
File "D:\Anaconda3\Lib\site-packages\keras\engine\topology.py", line 635, in add_inbound_node
  Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "D:\Anaconda3\Lib\site-packages\keras\engine\topology.py", line 172, in create_node
  output_tensors = to_list(outbound_layer.call(input_tensors, mask=input_masks))
File "D:\Anaconda3\Lib\site-packages\keras\engine\topology.py", line 1394, in call
  return K.concatenate(inputs, axis=self.concat_axis)
File "D:\Anaconda3\Lib\site-packages\keras\backend\theano_backend.py", line 583, in concatenate
  return T.concatenate([to_dense(x) for x in tensors], axis=axis)
File "D:\Anaconda3\Lib\site-packages\theano\tensor\basic.py", line 4734, in concatenate
  return join(axis, *tensor_list)
File "D:\Anaconda3\Lib\site-packages\theano\tensor\basic.py", line 4483, in join
  return join_(axis, *tensors_list)
File "D:\Anaconda3\Lib\site-packages\theano\gof\op.py", line 615, in __call__
  node = self.make_node(*inputs, **kwargs)
File "D:\Anaconda3\Lib\site-packages\theano\tensor\basic.py", line 4216, in make_node
  axis, tensors, as_tensor_variable_args, output_maker)
File "D:\Anaconda3\Lib\site-packages\theano\tensor\basic.py", line 4282, in _make_node_internal
  raise TypeError("Join() can only join tensors with the same "
builtins.TypeError: Join() can only join tensors with the same number of dimensions.

Any ideas? Thanks.

@LuisPB7 I combine the context and key into a whole tensor as an input, then split them in the Attention class. But that needs some modification in the Attention codes (stuff like tensor calculation, input/output shape).

@DanyaAlfageh

This comment has been minimized.

Copy link

@DanyaAlfageh DanyaAlfageh commented Apr 18, 2020

Will this work for images?

@Paryul10

This comment has been minimized.

Copy link

@Paryul10 Paryul10 commented May 28, 2020

image

I am getting this error. Can anyone please help me resolve it.

CODE:
model.add(Bidirectional(LSTM(lstm_output_size, dropout_W=0.2,dropout_U=0.2, return_sequences=True)))
model.add(Bidirectional(LSTM(lstm_output_size, dropout_W=0.2,dropout_U=0.2, return_sequences=True)))
model.add(AttentionWithContext())
model.add(Dense(numclasses, activation='softmax'))

@iridiumblue

This comment has been minimized.

@junyongyou

This comment has been minimized.

Copy link

@junyongyou junyongyou commented Oct 5, 2020

@cbaziotis
Thanks a lot for the code. I have a question about using mask. Could you please explain how to define and use a mask here? If I have already used a Masking layer before LSTM, e.g., x = Masking(mask_value=0.)(x), should I still use mask here? If so, how can I define the mask? I am using masking value as 0 in the masking layer for LSTM, then the LSTM layer knows which timesteps should be ignored. However, the LSTM features will not be zeros and might be arbitrary, how to define the mask for the attention layer then? Should we use the same mask as that for LSTM? Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment