Skip to content

Instantly share code, notes, and snippets.

@LysandreJik
Created December 16, 2019 22:34
Show Gist options
  • Save LysandreJik/c958925768eb6a9a72609ea99561d1cb to your computer and use it in GitHub Desktop.
Save LysandreJik/c958925768eb6a9a72609ea99561d1cb to your computer and use it in GitHub Desktop.
Training GPT-2 LM Head model in Keras
from transformers import GPT2Tokenizer, TFGPT2LMHeadModel
import tensorflow as tf
model = TFGPT2LMHeadModel.from_pretrained("distilgpt2")
tokenizer = GPT2Tokenizer.from_pretrained("distilgpt2")
text = """
A SQUAT grey building of only thirty-four stories. Over the main entrance the
words, CENTRAL LONDON HATCHERY AND CONDITIONING CENTRE,
and, in a shield, the World State’s motto, COMMUNITY, IDENTITY, STABILITY.
The enormous room on the ground floor faced towards the north. Cold for
all the summer beyond the panes, for all the tropical heat of the room itself,
a harsh thin light glared through the windows, hungrily seeking some draped
lay figure, some pallid shape of academic goose-flesh, but finding only the glass
and nickel and bleakly shining porcelain of a laboratory. Wintriness responded
to wintriness. The overalls of the workers were white, their hands gloved with
a pale corpse-coloured rubber. The light was frozen, dead, a ghost. Only from
the yellow barrels of the microscopes did it borrow a certain rich and living
substance, lying along the polished tubes like butter, streak after luscious streak
in long recession down the work tables.
“And this,” said the Director opening the door, “is the Fertilizing Room.”
Bent over their instruments, three hundred Fertilizers were plunged, as the Director of Hatcheries and Conditioning entered the room, in the scarcely breathing silence, the absent-minded, soliloquizing hum or whistle, of absorbed
concentration. A troop of newly arrived students, very young, pink and callow,
followed nervously, rather abjectly, at the Director’s heels. Each of them carried
a notebook, in which, whenever the great man spoke, he desperately scribbled.
Straight from the horse’s mouth. It was a rare privilege. The D. H. C. for Central
London always made a point of personally conducting his new students round
the various departments.
“Just to give you a general idea,” he would explain to them. For of course some
sort of general idea they must have, if they were to do their work intelligentlythough as little of one, if they were to be good and happy members of society, as
possible. For particulars, as every one knows, make for virture and happiness;
generalities are intellectually necessary evils. Not philosophers but fretsawyers
""" * 100
tokenized_text = tokenizer.encode(text)
examples = []
block_size = 100
for i in range(0, len(tokenized_text) - block_size + 1, block_size): # Truncate in block of block_size
examples.append(tokenized_text[i:i + block_size])
inputs, labels = [], []
for ex in examples:
inputs.append(ex[:-1])
labels.append(ex[1:])
dataset = tf.data.Dataset.from_tensor_slices((inputs, labels))
BATCH_SIZE = 16
BUFFER_SIZE = 10000
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
model.compile(optimizer=optimizer, loss=[loss, *[None] * model.config.n_layer], metrics=[metric])
model.fit(dataset, epochs=3)
@jayendra13
Copy link

I am getting this for the above code

ValueError: in user code:

    /home/jayendra/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:806 train_function  *
        return step_function(self, iterator)
    /home/jayendra/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:796 step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    /home/jayendra/venv/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:1211 run
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /home/jayendra/venv/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:2585 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    /home/jayendra/venv/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:2945 _call_for_each_replica
        return fn(*args, **kwargs)
    /home/jayendra/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:789 run_step  **
        outputs = model.train_step(data)
    /home/jayendra/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:759 train_step
        self.compiled_metrics.update_state(y, y_pred, sample_weight)
    /home/jayendra/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/compile_utils.py:409 update_state
        metric_obj.update_state(y_t, y_p, sample_weight=mask)
    /home/jayendra/venv/lib/python3.7/site-packages/tensorflow/python/keras/utils/metrics_utils.py:90 decorated
        update_op = update_state_fn(*args, **kwargs)
    /home/jayendra/venv/lib/python3.7/site-packages/tensorflow/python/keras/metrics.py:176 update_state_fn
        return ag_update_state(*args, **kwargs)
    /home/jayendra/venv/lib/python3.7/site-packages/tensorflow/python/keras/metrics.py:612 update_state  **
        matches = ag_fn(y_true, y_pred, **self._fn_kwargs)
    /home/jayendra/venv/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py:201 wrapper
        return target(*args, **kwargs)
    /home/jayendra/venv/lib/python3.7/site-packages/tensorflow/python/keras/metrics.py:3309 sparse_categorical_accuracy
        return math_ops.cast(math_ops.equal(y_true, y_pred), K.floatx())
    /home/jayendra/venv/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py:201 wrapper
        return target(*args, **kwargs)
    /home/jayendra/venv/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py:1614 equal
        return gen_math_ops.equal(x, y, name=name)
    /home/jayendra/venv/lib/python3.7/site-packages/tensorflow/python/ops/gen_math_ops.py:3224 equal
        name=name)
    /home/jayendra/venv/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:744 _apply_op_helper
        attrs=attr_protos, op_def=op_def)
    /home/jayendra/venv/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py:593 _create_op_internal
        compute_device)
    /home/jayendra/venv/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:3485 _create_op_internal
        op_def=op_def)
    /home/jayendra/venv/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1975 __init__
        control_input_ops, op_def)
    /home/jayendra/venv/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1815 _create_c_op
        raise ValueError(str(e))

    ValueError: Dimensions must be equal, but are 16 and 12 for '{{node Equal_1}} = Equal[T=DT_FLOAT, incompatible_shape_error=true](Cast_6, Cast_7)' with input shapes: [16,99], [2,16,12,99].

@LysandreJik
Copy link
Author

I'm not sure this ever worked to be honest

@jayendra13
Copy link

Thanks for your response, I will switch to custom training loop instead of fit.

@alexol91
Copy link

Change BATCH_SIZE = 16 for BATCH_SIZE = 12

@jayendra13
Copy link

jayendra13 commented Oct 30, 2020

so the batch-size 12 is hardcoded inside this pretrained model ? different batch size also don't work in the model created from the default config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment