Skip to content

Instantly share code, notes, and snippets.

@Lexie88rus
Created August 30, 2019 20:54
Show Gist options
  • Save Lexie88rus/033b92ff7c886bb3f7065ee8e2cdf497 to your computer and use it in GitHub Desktop.
Save Lexie88rus/033b92ff7c886bb3f7065ee8e2cdf497 to your computer and use it in GitHub Desktop.
Generation of sequences out of kernel titles
# Generate sequences out of titles:
# Define sequence length
sequence_length = 3
# Generate sequences
def generate_sequences(titles):
sequences = []
targets = []
# Loop for all selected titles
for title in titles:
# Run through each title
if clean_title(title) != '' and clean_title(title) != ' ':
words = extract_words(title)
words.append(end_of_sentence)
for i in range(0, len(words) - sequence_length):
sequence = words[i:i + sequence_length]
target = words[i + sequence_length:i + sequence_length + 1]
sequence_tensor = sequenceToTensor(sequence)
target_tensor = sequenceToTensor(target)
sequences.append(sequence_tensor)
targets.append( target_tensor)
return sequences, targets
# generate sequences. I took only 5000 top popular kernel titles
sequences, targets = generate_sequences(popular_kernel_titles[:5000])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment