Skip to content

Instantly share code, notes, and snippets.

@hanneshapke
Created March 9, 2020 17:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hanneshapke/d7214f55a11bb227dcf7763048de52d9 to your computer and use it in GitHub Desktop.
Save hanneshapke/d7214f55a11bb227dcf7763048de52d9 to your computer and use it in GitHub Desktop.
def preprocessing_fn(inputs):
def tokenize_text(text, sequence_length=MAX_SEQ_LEN):
...
return tf.reshape(tokens, [-1, sequence_length])
def preprocess_bert_input(text, segment_id=0):
input_word_ids = tokenize_text(text)
...
return (
input_word_ids,
input_mask,
input_type_ids
)
...
input_word_ids, input_mask, input_type_ids = \
preprocess_bert_input(_fill_in_missing(inputs['text']))
return {
'input_word_ids': input_word_ids,
'input_mask': input_mask,
'input_type_ids': input_type_ids,
'label': inputs['label']
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment