Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save 1UC1F3R616/5e0b1be04a38d535ac0ac8e1ed4f7e1a to your computer and use it in GitHub Desktop.
Save 1UC1F3R616/5e0b1be04a38d535ac0ac8e1ed4f7e1a to your computer and use it in GitHub Desktop.
  • A neural network is basically a set of functions which can learn patterns
  • TensorFlow allows developers to create dataflow graphs—structures that describe how data moves through a graph, or a series of processing nodes. Each node in the graph represents a mathematical operation, and each connection or edge between nodes is a multidimensional data array, or tensor.

NLP

Tokeniz

from tensorflow.keras.preprocessing.text import Tokenizer
sentences = [
  'i love my dog',
  'I, love my cat'
]

tokenizer = Tokenizer(num_words = 100)
tokenizer.fit_on_texts(sentences)
word_index = tokenizer.word_index
print(word_index) # {'i': 1, 'love': 2, 'my': 3, 'dog': 4, 'cat': 5}

Sequences from Text

If unseen words are passed in a sentence to the model then it won't be able to showcase them when using texts_to_sequences

from tensorflow.keras.preprocessing.text import Tokenizer
sentences = [
  'i love my dog',
  'I, love my cat'
]

tokenizer = Tokenizer(num_words = 100) # (num_words = 100, oov_token="<OOV>")
tokenizer.fit_on_texts(sentences) # model
word_index = tokenizer.word_index # getting dict of mapped words

sequences = tokenizer.texts_to_sequences(sentences) # passing sentences to model to get mapped keywords list

print(word_index) # {'i': 1, 'love': 2, 'my': 3, 'dog': 4, 'cat': 5}
print(sequences) # [[1, 2, 3, 4], [1, 2, 3, 5]]

Padding

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

sentences = [
  'i love my dog',
  'I, love my cat'
]

tokenizer = Tokenizer(num_words = 100) # (num_words = 100, oov_token="<OOV>")
tokenizer.fit_on_texts(sentences) # model
word_index = tokenizer.word_index # getting dict of mapped words

sequences = tokenizer.texts_to_sequences(sentences) # passing sentences to model to get mapped keywords list

padded = pad_sequences(sequences, padding='post', truncating='post', maxlen=5)

print(word_index) # {'i': 1, 'love': 2, 'my': 3, 'dog': 4, 'cat': 5}
print(sequences) # [[1, 2, 3, 4], [1, 2, 3, 5]]
print(padded) # [[1 2 3 4]
 # [1 2 3 5]]
import json

with open('sarcasm.json', 'r') as f:
  datastore = json.load(f)

sentences = []
labels = []
urls = []
for item in datastore:
  sentences.append(item['headline'])
  labels.append(item['is_sarcastic'])
  urls.append(item['article_link'])
# !wget --no-check-certificate \
#     https://storage.googleapis.com/laurencemoroney-blog.appspot.com/sarcasm.json \
#     -O /tmp/sarcasm.json

import json

with open('/tmp/sarcasm.json', 'r') as f:
  datastore = json.load(f)

sentences = []
labels = []
urls = []
for item in datastore:
  sentences.append(item['headline'])
  labels.append(item['is_sarcastic'])
  urls.append(item['article_link'])

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

tokenizer = Tokenizer(oov_token="<OOV>")
tokenizer.fit_on_texts(sentences) # model

word_index = tokenizer.word_index # getting dict of mapped words

sequences = tokenizer.texts_to_sequences(sentences) # passing sentences to model to get mapped keywords list

padded = pad_sequences(sequences, padding='post')

print(sentences[2])
print(padded[2])
print(padded.shape)
  • DataSets pre downloaded in Tensorflow image
  • XX

Sequence Models

  • Examplesof sequence data

    • Speech Recognition
    • Music Generation
    • Sentiment classification
    • DNA sequence analysis
    • Machine Translation
    • Name entity recognization
  • Recurrent Network Model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment