ebraraktas/tflite_slow_inference_2_4_0_conversion_bug.ipynb

## tflite_slow_inference_2_4_0_conversion_bug.ipynb
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "tflite_slow_inference_2.4.0_conversion_bug.ipynb",
      "provenance": [],
      "collapsed_sections": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Jzl22GNGbDt7"
      },
      "source": [
        "# Slow inference on TF Lite>=2.4.0 converted model"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "L4RoUpQkjuk5"
      },
      "source": [
        "Conversion of `Conv1D` layer seems to have changed in TF Lite `2.4.0`. Minimal example below demonstrates that inference on model converted by `tensoflow>=2.4.0` 27x slower compared to one converted by `tensoflow==2.4.0`.\n",
        "\n",
        "Steps to reproduce:\n",
        "0. If you have models in the github issue, upload them to root and jump to step 4.\n",
        "1. Run `Prepare` section and install tensorflow `2.3.0`. Restart kernel.\n",
        "2. Run `Create Model` and `Conversion` section.\n",
        "3. Repeat step 1 and 2 for tensorflow `2.6.0`\n",
        "4. Run `Inference Test`\n",
        "\n",
        "### Previous Results:\n",
        "\n",
        "```\n",
        "TF Runtime Version: 2.6.0\n",
        "Model path: model_2.3.0.tflite\n",
        "Test Duration: 0.5242369174957275\n",
        "= = = = = = = = = = = = = = = = = = = = \n",
        "TF Runtime Version: 2.6.0\n",
        "Model path: model_2.3.0_quant.tflite\n",
        "Test Duration: 0.7532312870025635\n",
        "= = = = = = = = = = = = = = = = = = = = \n",
        "TF Runtime Version: 2.6.0\n",
        "Model path: model_2.6.0.tflite\n",
        "Test Duration: 0.532163143157959\n",
        "= = = = = = = = = = = = = = = = = = = = \n",
        "TF Runtime Version: 2.6.0\n",
        "Model path: model_2.6.0_quant.tflite\n",
        "Test Duration: 18.914307594299316\n",
        "= = = = = = = = = = = = = = = = = = = = \n",
        "= = = = = = = = = = = = = = = = = = = = \n",
        "TF Runtime Version: 2.3.0\n",
        "Model path: model_2.3.0.tflite\n",
        "Test Duration: 0.5360305309295654\n",
        "= = = = = = = = = = = = = = = = = = = = \n",
        "TF Runtime Version: 2.3.0\n",
        "Model path: model_2.3.0_quant.tflite\n",
        "Test Duration: 0.5518338680267334\n",
        "= = = = = = = = = = = = = = = = = = = = \n",
        "TF Runtime Version: 2.3.0\n",
        "Model path: model_2.6.0.tflite\n",
        "Test Duration: 0.5399584770202637\n",
        "= = = = = = = = = = = = = = = = = = = = \n",
        "TF Runtime Version: 2.3.0\n",
        "Model path: model_2.6.0_quant.tflite\n",
        "Cannot create Interpreter! Exception:\n",
        "Didn't find op for builtin opcode 'CONV_2D' version '5'\n",
        "Registration failed.\n",
        "```"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PwQ5iEETanLq"
      },
      "source": [
        "## Prepare"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Dh5dvbl6arr5"
      },
      "source": [
        "# Using TF Lite 2.3.0 for conversion creates much faster model\n",
        "use_tf_2_3 = False \n",
        "\n",
        "if use_tf_2_3:\n",
        "    !pip install tensorflow==2.3.0\n",
        "else:\n",
        "    # 2.5.0 and 2.4.0 performs same\n",
        "    !pip install tensorflow==2.6.0"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "l9vk-50PZkSj"
      },
      "source": [
        "import tensorflow as tf\n",
        "print(tf.__version__)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "XVyhL41jaIFs"
      },
      "source": [
        "def generate_model(num_hidden_units: int = 500, num_layers: int = 1) -> tf.keras.models.Sequential:\n",
        "    model = tf.keras.models.Sequential()\n",
        "    for _ in range(num_layers):\n",
        "        model.add(tf.keras.layers.Conv1D(filters=num_hidden_units, kernel_size=3, strides=1, padding='SAME', activation='relu'))\n",
        "    optimizer = tf.keras.optimizers.Adam()\n",
        "    model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])\n",
        "    return model"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7WdyoJQ6aj3Q"
      },
      "source": [
        "## Create model"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "dh-uhJw6aKIU"
      },
      "source": [
        "model = generate_model()\n",
        "model.build(input_shape=(None, None, 80))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pCZubKlIafJb"
      },
      "source": [
        "## Conversion"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "9sIGNMdraXW6"
      },
      "source": [
        "for quantize in (True, False):\n",
        "    converter = tf.lite.TFLiteConverter.from_keras_model(model)\n",
        "\n",
        "    if quantize:\n",
        "        converter.optimizations = [tf.lite.Optimize.DEFAULT]\n",
        "\n",
        "    model_path = f\"model_{tf.__version__}{'_quant' if quantize else ''}.tflite\"\n",
        "    with open(model_path, \"wb\") as f:\n",
        "        f.write(converter.convert())"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LdqKlzo0acqt"
      },
      "source": [
        "## Inference test"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "wjkVCwgYoz4u"
      },
      "source": [
        "import time, glob\n",
        "data = tf.random.normal((1, 50, 80))\n",
        "\n",
        "for inference_model_path in sorted(glob.glob(\"model_*.tflite\")):\n",
        "    print(f\"TF Runtime Version: {tf.__version__}\")\n",
        "    print(f\"Model path: {inference_model_path}\")\n",
        "    try:\n",
        "        interpreter = tf.lite.Interpreter(inference_model_path)\n",
        "    except Exception as e:\n",
        "        print(\"Cannot create Interpreter! Exception:\")\n",
        "        print(e)\n",
        "        continue\n",
        "    interpreter.resize_tensor_input(0, [1, 50, 80])\n",
        "    interpreter.allocate_tensors()\n",
        "\n",
        "    start = time.time()\n",
        "    for _ in range(1000):\n",
        "        interpreter.set_tensor(0, data)\n",
        "        interpreter.invoke()\n",
        "    print(\"Test Duration:\", time.time() - start)\n",
        "    print(\"= \" * 20)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "RomoJSTYfem2"
      },
      "source": [
        ""
      ],
      "execution_count": null,
      "outputs": []
    }
  ]
}
	{
	"nbformat": 4,
	"nbformat_minor": 0,
	"metadata": {
	"colab": {
	"name": "tflite_slow_inference_2.4.0_conversion_bug.ipynb",
	"provenance": [],
	"collapsed_sections": []
	},
	"kernelspec": {
	"name": "python3",
	"display_name": "Python 3"
	},
	"language_info": {
	"name": "python"
	}
	},
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "Jzl22GNGbDt7"
	},
	"source": [
	"# Slow inference on TF Lite>=2.4.0 converted model"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "L4RoUpQkjuk5"
	},
	"source": [
	"Conversion of `Conv1D` layer seems to have changed in TF Lite `2.4.0`. Minimal example below demonstrates that inference on model converted by `tensoflow>=2.4.0` 27x slower compared to one converted by `tensoflow==2.4.0`.\n",
	"\n",
	"Steps to reproduce:\n",
	"0. If you have models in the github issue, upload them to root and jump to step 4.\n",
	"1. Run `Prepare` section and install tensorflow `2.3.0`. Restart kernel.\n",
	"2. Run `Create Model` and `Conversion` section.\n",
	"3. Repeat step 1 and 2 for tensorflow `2.6.0`\n",
	"4. Run `Inference Test`\n",
	"\n",
	"### Previous Results:\n",
	"\n",
	"```\n",
	"TF Runtime Version: 2.6.0\n",
	"Model path: model_2.3.0.tflite\n",
	"Test Duration: 0.5242369174957275\n",
	"= = = = = = = = = = = = = = = = = = = = \n",
	"TF Runtime Version: 2.6.0\n",
	"Model path: model_2.3.0_quant.tflite\n",
	"Test Duration: 0.7532312870025635\n",
	"= = = = = = = = = = = = = = = = = = = = \n",
	"TF Runtime Version: 2.6.0\n",
	"Model path: model_2.6.0.tflite\n",
	"Test Duration: 0.532163143157959\n",
	"= = = = = = = = = = = = = = = = = = = = \n",
	"TF Runtime Version: 2.6.0\n",
	"Model path: model_2.6.0_quant.tflite\n",
	"Test Duration: 18.914307594299316\n",
	"= = = = = = = = = = = = = = = = = = = = \n",
	"= = = = = = = = = = = = = = = = = = = = \n",
	"TF Runtime Version: 2.3.0\n",
	"Model path: model_2.3.0.tflite\n",
	"Test Duration: 0.5360305309295654\n",
	"= = = = = = = = = = = = = = = = = = = = \n",
	"TF Runtime Version: 2.3.0\n",
	"Model path: model_2.3.0_quant.tflite\n",
	"Test Duration: 0.5518338680267334\n",
	"= = = = = = = = = = = = = = = = = = = = \n",
	"TF Runtime Version: 2.3.0\n",
	"Model path: model_2.6.0.tflite\n",
	"Test Duration: 0.5399584770202637\n",
	"= = = = = = = = = = = = = = = = = = = = \n",
	"TF Runtime Version: 2.3.0\n",
	"Model path: model_2.6.0_quant.tflite\n",
	"Cannot create Interpreter! Exception:\n",
	"Didn't find op for builtin opcode 'CONV_2D' version '5'\n",
	"Registration failed.\n",
	"```"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "PwQ5iEETanLq"
	},
	"source": [
	"## Prepare"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "Dh5dvbl6arr5"
	},
	"source": [
	"# Using TF Lite 2.3.0 for conversion creates much faster model\n",
	"use_tf_2_3 = False \n",
	"\n",
	"if use_tf_2_3:\n",
	" !pip install tensorflow==2.3.0\n",
	"else:\n",
	" # 2.5.0 and 2.4.0 performs same\n",
	" !pip install tensorflow==2.6.0"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "l9vk-50PZkSj"
	},
	"source": [
	"import tensorflow as tf\n",
	"print(tf.__version__)"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "XVyhL41jaIFs"
	},
	"source": [
	"def generate_model(num_hidden_units: int = 500, num_layers: int = 1) -> tf.keras.models.Sequential:\n",
	" model = tf.keras.models.Sequential()\n",
	" for _ in range(num_layers):\n",
	" model.add(tf.keras.layers.Conv1D(filters=num_hidden_units, kernel_size=3, strides=1, padding='SAME', activation='relu'))\n",
	" optimizer = tf.keras.optimizers.Adam()\n",
	" model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])\n",
	" return model"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "7WdyoJQ6aj3Q"
	},
	"source": [
	"## Create model"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "dh-uhJw6aKIU"
	},
	"source": [
	"model = generate_model()\n",
	"model.build(input_shape=(None, None, 80))"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "pCZubKlIafJb"
	},
	"source": [
	"## Conversion"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "9sIGNMdraXW6"
	},
	"source": [
	"for quantize in (True, False):\n",
	" converter = tf.lite.TFLiteConverter.from_keras_model(model)\n",
	"\n",
	" if quantize:\n",
	" converter.optimizations = [tf.lite.Optimize.DEFAULT]\n",
	"\n",
	" model_path = f\"model_{tf.__version__}{'_quant' if quantize else ''}.tflite\"\n",
	" with open(model_path, \"wb\") as f:\n",
	" f.write(converter.convert())"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "LdqKlzo0acqt"
	},
	"source": [
	"## Inference test"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "wjkVCwgYoz4u"
	},
	"source": [
	"import time, glob\n",
	"data = tf.random.normal((1, 50, 80))\n",
	"\n",
	"for inference_model_path in sorted(glob.glob(\"model_*.tflite\")):\n",
	" print(f\"TF Runtime Version: {tf.__version__}\")\n",
	" print(f\"Model path: {inference_model_path}\")\n",
	" try:\n",
	" interpreter = tf.lite.Interpreter(inference_model_path)\n",
	" except Exception as e:\n",
	" print(\"Cannot create Interpreter! Exception:\")\n",
	" print(e)\n",
	" continue\n",
	" interpreter.resize_tensor_input(0, [1, 50, 80])\n",
	" interpreter.allocate_tensors()\n",
	"\n",
	" start = time.time()\n",
	" for _ in range(1000):\n",
	" interpreter.set_tensor(0, data)\n",
	" interpreter.invoke()\n",
	" print(\"Test Duration:\", time.time() - start)\n",
	" print(\"= \" * 20)"
	],
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "RomoJSTYfem2"
	},
	"source": [
	""
	],
	"execution_count": null,
	"outputs": []
	}
	]
	}