Gregorgeous/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Custom TF2.0 Perplexity Metric


In TensorFlow 2.0 metrics have a  brand new form of "stateful" objects that have a uniform API consisting of 4 methods:

def __init__(self):
  #...
def update_state(self, y_true, y_pred, sample_weight=None):
  #...
def result(self):
  #...
def reset_states(self):
  #...
(Details here: https://www.tensorflow.org/beta/guide/keras/training_and_evaluation#writing_custom_losses_and_metrics)

Since there's no metric measuring perplexity in TF API so far,
I made a custom one using the formula shown by Kirill Mavreshko in his Keras-Transformer implementation:
https://github.com/kpot/keras-transformer
and the logic for calculating sparseCategoricalCrossentropy loss found at:
https://www.tensorflow.org/beta/tutorials/text/transformer#loss_and_metrics
Perplexity seems to be better metric than accuracy for language generation models - more interpretable one at least.
(Shout-out to Aerin Kim for a cool article on perplexity ! :) https://towardsdatascience.com/perplexity-intuition-and-derivation-105dd481c8f3)
The metric is expecting y_true and y_pred - that's compliant to this new Metrics API.
Bear in mind, though, it expects y_pred being in format of logits -
that's exactly what the Transformer models outputs during training loop
(inspect the predictions output in the train_step method to see the 'logits' as the output: https://www.tensorflow.org/beta/tutorials/text/transformer#training_and_checkpointing,
'logits' are a non-normalised result of model's predictions (raw vectors), if you softmax it, you get probabilities distribution :)
https://developers.google.com/machine-learning/glossary/#logits ) but bear that in mind if using this metric for any different model/scenario.


## perplexity-TF2.0_metric.py
import tensorflow as tf
K = tf.keras.backend # Alias to Keras' backend namespace.

class PerplexityMetric(tf.keras.metrics.Metric):
    """
    USAGE NOTICE: this metric accepts only logits for now (i.e. expect the same behaviour as from tf.keras.losses.SparseCategoricalCrossentropy with the a provided argument "from_logits=True",
		here the same loss is used with "from_logits=True" enforced so you need to provide it in such a format)
    METRIC DESCRIPTION:
    Popular metric for evaluating language modelling architectures.
    More info: http://cs224d.stanford.edu/lecture_notes/LectureNotes4.pdf.
    DISCLAIMER: Original function created by Kirill Mavreshko in https://github.com/kpot/keras-transformer/blob/b9d4e76c535c0c62cadc73e37416e4dc18b635ca/example/run_gpt.py#L106.
    My "contribution": I converted Kirill method's logic (and added a padding masking to to it) into this new Tensorflow 2.0 way of doing things via a stateful "Metric" object. This required making the metric a fully-fledged object by subclassing the Metric class.
    """
    def __init__(self, name='perplexity', **kwargs):
      super(PerplexityMetric, self).__init__(name=name, **kwargs)
      self.cross_entropy = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none')
      self.perplexity = self.add_weight(name='tp', initializer='zeros')

		# Consider uncommenting the decorator for a performance boost (?)
    # @tf.function
    def _calculate_perplexity(self, real, pred):
			# The next 4 lines zero-out the padding from loss calculations,
			# this follows the logic from: https://www.tensorflow.org/beta/tutorials/text/transformer#loss_and_metrics
      mask = tf.math.logical_not(tf.math.equal(real, 0))
      loss_ = self.cross_entropy(real, pred)
      mask = tf.cast(mask, dtype=loss_.dtype)
      loss_ *= mask
			# Calculating the perplexity steps:
      step1 = K.mean(loss_, axis=-1)
      step2 = K.exp(step1)
      perplexity = K.mean(step2)

      return perplexity


    def update_state(self, y_true, y_pred, sample_weight=None):
      # TODO:FIXME: handle sample_weight !
      if sample_weight is not None:
          print("WARNING! Provided 'sample_weight' argument to the perplexity metric. Currently this is not handled and won't do anything differently..")
      perplexity = self._calculate_perplexity(y_true, y_pred)
			# Remember self.perplexity is a tensor (tf.Variable), so using simply "self.perplexity = perplexity" will result in error because of mixing EagerTensor and Graph operations
      self.perplexity.assign_add(perplexity)

    def result(self):
      return self.perplexity

    def reset_states(self):
      # The state of the metric will be reset at the start of each epoch.
      self.perplexity.assign(0.)
	import tensorflow as tf
	K = tf.keras.backend # Alias to Keras' backend namespace.

	class PerplexityMetric(tf.keras.metrics.Metric):
	"""
	USAGE NOTICE: this metric accepts only logits for now (i.e. expect the same behaviour as from tf.keras.losses.SparseCategoricalCrossentropy with the a provided argument "from_logits=True",
	here the same loss is used with "from_logits=True" enforced so you need to provide it in such a format)
	METRIC DESCRIPTION:
	Popular metric for evaluating language modelling architectures.
	More info: http://cs224d.stanford.edu/lecture_notes/LectureNotes4.pdf.
	DISCLAIMER: Original function created by Kirill Mavreshko in https://github.com/kpot/keras-transformer/blob/b9d4e76c535c0c62cadc73e37416e4dc18b635ca/example/run_gpt.py#L106.
	My "contribution": I converted Kirill method's logic (and added a padding masking to to it) into this new Tensorflow 2.0 way of doing things via a stateful "Metric" object. This required making the metric a fully-fledged object by subclassing the Metric class.
	"""
	def __init__(self, name='perplexity', **kwargs):
	super(PerplexityMetric, self).__init__(name=name, **kwargs)
	self.cross_entropy = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none')
	self.perplexity = self.add_weight(name='tp', initializer='zeros')

	# Consider uncommenting the decorator for a performance boost (?)
	# @tf.function
	def _calculate_perplexity(self, real, pred):
	# The next 4 lines zero-out the padding from loss calculations,
	# this follows the logic from: https://www.tensorflow.org/beta/tutorials/text/transformer#loss_and_metrics
	mask = tf.math.logical_not(tf.math.equal(real, 0))
	loss_ = self.cross_entropy(real, pred)
	mask = tf.cast(mask, dtype=loss_.dtype)
	loss_ *= mask
	# Calculating the perplexity steps:
	step1 = K.mean(loss_, axis=-1)
	step2 = K.exp(step1)
	perplexity = K.mean(step2)

	return perplexity


	def update_state(self, y_true, y_pred, sample_weight=None):
	# TODO:FIXME: handle sample_weight !
	if sample_weight is not None:
	print("WARNING! Provided 'sample_weight' argument to the perplexity metric. Currently this is not handled and won't do anything differently..")
	perplexity = self._calculate_perplexity(y_true, y_pred)
	# Remember self.perplexity is a tensor (tf.Variable), so using simply "self.perplexity = perplexity" will result in error because of mixing EagerTensor and Graph operations
	self.perplexity.assign_add(perplexity)

	def result(self):
	return self.perplexity

	def reset_states(self):
	# The state of the metric will be reset at the start of each epoch.
	self.perplexity.assign(0.)