Skip to content

Instantly share code, notes, and snippets.

View hanneshapke's full-sized avatar

Hannes Hapke hanneshapke

View GitHub Profile
do_lower_case = bert_layer.resolved_object.do_lower_case.numpy()
import tensorflow_hub as hub
BERT_TFHUB_URL = "https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/2"
bert_layer = hub.KerasLayer(handle=BERT_TFHUB_URL, trainable=True)
vocab_file_path = bert_layer.resolved_object.vocab_file.asset_path.numpy()
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
‘This is the best movie I have ever seen ...’ -> 1
‘Probably the worst movie produced in 2019 ...’ -> 0
‘Tom Hank\’s performance turns this movie into ...’ -> ?
import tensorflow_text as text
vocab_file_path = bert_layer.resolved_object.vocab_file.asset_path.numpy()
do_lower_case = bert_layer.resolved_object.do_lower_case.numpy()
bert_tokenizer = text.BertTokenizer(
vocab_lookup_table=vocab_file_path,
token_out_type=tf.int64,
lower_case=do_lower_case
)
[
[[b'clara'], [b'is'], [b'playing'], [b'the'], [b'piano'], [b'.']],
[[b'maria'], [b'likes'], [b'to'], [b'play'], [b'soccer'], [b'.']],
[[b'hi'], [b'tom'], [b'!']]
]
[
"Clara is playing the piano."
"Maria likes to play soccer.",
"Hi Tom!"
]
@hanneshapke
hanneshapke / tfx-pipeline-for-bert-preprocessing.ipynb
Last active August 21, 2021 06:18
TFX Pipeline for Bert Preprocessing.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
import base64
import googleapiclient.discovery
from example_pb2 import Example
from feature_pb2 import BytesList, Feature, Features
def _convert_to_pb(value):
""" Serialize a given sentence to the ProtoBuf Structure required to model the tf.Example data structure.
Feel free to add more features and different data types if your models reqiures different inputs. An overview of
@hanneshapke
hanneshapke / .bashrc
Last active June 12, 2018 20:37
dev setup
export CURRENT_DEV=kreuzberg
alias latest_dev='cd ~/development/$CURRENT_DEV'
# ssh tunnel
alias ssd='~/bin/ssh_host_color.sh ubuntu@remote -p 823 -L 6006:gpu:6006'
# add additional paths to the PYTHONPATH
export PYTHONPATH=$PYTHONPATH:~/development/additional_package
# git shortcuts
@hanneshapke
hanneshapke / redis.py
Created May 24, 2018 21:38
Load word vectors into a redis db
import bz2
import pickle
from django.conf import settings
from djang_redis import get_redis_connection
from tqdm import tqdm
from .constants import GOOGLE_WORD2VEC_MODEL_NAME