- A neural network is basically a set of functions which can learn patterns
- TensorFlow allows developers to create dataflow graphs—structures that describe how data moves through a graph, or a series of processing nodes. Each node in the graph represents a mathematical operation, and each connection or edge between nodes is a multidimensional data array, or tensor.
from tensorflow.keras.preprocessing.text import Tokenizer
sentences = [
'i love my dog',
'I, love my cat'
]
tokenizer = Tokenizer(num_words = 100)
tokenizer.fit_on_texts(sentences)
word_index = tokenizer.word_index
print(word_index) # {'i': 1, 'love': 2, 'my': 3, 'dog': 4, 'cat': 5}
If unseen words are passed in a sentence to the model then it won't be able to showcase them when using texts_to_sequences
from tensorflow.keras.preprocessing.text import Tokenizer
sentences = [
'i love my dog',
'I, love my cat'
]
tokenizer = Tokenizer(num_words = 100) # (num_words = 100, oov_token="<OOV>")
tokenizer.fit_on_texts(sentences) # model
word_index = tokenizer.word_index # getting dict of mapped words
sequences = tokenizer.texts_to_sequences(sentences) # passing sentences to model to get mapped keywords list
print(word_index) # {'i': 1, 'love': 2, 'my': 3, 'dog': 4, 'cat': 5}
print(sequences) # [[1, 2, 3, 4], [1, 2, 3, 5]]
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
sentences = [
'i love my dog',
'I, love my cat'
]
tokenizer = Tokenizer(num_words = 100) # (num_words = 100, oov_token="<OOV>")
tokenizer.fit_on_texts(sentences) # model
word_index = tokenizer.word_index # getting dict of mapped words
sequences = tokenizer.texts_to_sequences(sentences) # passing sentences to model to get mapped keywords list
padded = pad_sequences(sequences, padding='post', truncating='post', maxlen=5)
print(word_index) # {'i': 1, 'love': 2, 'my': 3, 'dog': 4, 'cat': 5}
print(sequences) # [[1, 2, 3, 4], [1, 2, 3, 5]]
print(padded) # [[1 2 3 4]
# [1 2 3 5]]
Loading data from news-headlines-dataset-for-sarcasm-detection
import json
with open('sarcasm.json', 'r') as f:
datastore = json.load(f)
sentences = []
labels = []
urls = []
for item in datastore:
sentences.append(item['headline'])
labels.append(item['is_sarcastic'])
urls.append(item['article_link'])
# !wget --no-check-certificate \
# https://storage.googleapis.com/laurencemoroney-blog.appspot.com/sarcasm.json \
# -O /tmp/sarcasm.json
import json
with open('/tmp/sarcasm.json', 'r') as f:
datastore = json.load(f)
sentences = []
labels = []
urls = []
for item in datastore:
sentences.append(item['headline'])
labels.append(item['is_sarcastic'])
urls.append(item['article_link'])
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
tokenizer = Tokenizer(oov_token="<OOV>")
tokenizer.fit_on_texts(sentences) # model
word_index = tokenizer.word_index # getting dict of mapped words
sequences = tokenizer.texts_to_sequences(sentences) # passing sentences to model to get mapped keywords list
padded = pad_sequences(sequences, padding='post')
print(sentences[2])
print(padded[2])
print(padded.shape)
- XX
-
Examplesof sequence data
- Speech Recognition
- Music Generation
- Sentiment classification
- DNA sequence analysis
- Machine Translation
- Name entity recognization
-
Recurrent Network Model