This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
max_len_text=80 | |
max_len_summary=10 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from sklearn.model_selection import train_test_split | |
x_tr,x_val,y_tr,y_val=train_test_split(data['cleaned_text'],data['cleaned_summary'],test_size=0.1,random_state=0,shuffle=True) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from matplotlib import pyplot | |
pyplot.plot(history.history['loss'], label='train') | |
pyplot.plot(history.history['val_loss'], label='test') | |
pyplot.legend() pyplot.show() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
reverse_target_word_index=y_tokenizer.index_word | |
reverse_source_word_index=x_tokenizer.index_word | |
target_word_index=y_tokenizer.word_index |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# encoder inference | |
encoder_model = Model(inputs=encoder_inputs,outputs=[encoder_outputs, state_h, state_c]) | |
# decoder inference | |
# Below tensors will hold the states of the previous time step | |
decoder_state_input_h = Input(shape=(latent_dim,)) | |
decoder_state_input_c = Input(shape=(latent_dim,)) | |
decoder_hidden_state_input = Input(shape=(max_len_text,latent_dim)) | |
# Get the embeddings of the decoder sequence |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def seq2summary(input_seq): | |
newString='' | |
for i in input_seq: | |
if((i!=0 and i!=target_word_index['start']) and i!=target_word_index['end']): | |
newString=newString+reverse_target_word_index[i]+' ' | |
return newString | |
def seq2text(input_seq): | |
newString='' | |
for i in input_seq: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#prepare a tokenizer for reviews on training data | |
x_tokenizer = Tokenizer() | |
x_tokenizer.fit_on_texts(list(x_tr)) | |
#convert text sequences into integer sequences | |
x_tr = x_tokenizer.texts_to_sequences(x_tr) | |
x_val = x_tokenizer.texts_to_sequences(x_val) | |
#padding zero upto maximum length | |
x_tr = pad_sequences(x_tr, maxlen=max_len_text, padding='post') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#preparing a tokenizer for summary on training data | |
y_tokenizer = Tokenizer() | |
y_tokenizer.fit_on_texts(list(y_tr)) | |
#convert summary sequences into integer sequences | |
y_tr = y_tokenizer.texts_to_sequences(y_tr) | |
y_val = y_tokenizer.texts_to_sequences(y_val) | |
#padding zero upto maximum length | |
y_tr = pad_sequences(y_tr, maxlen=max_len_summary, padding='post') |