Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save ebraraktas/30d0a24ce2fb391a22e68a30e2f9cdc1 to your computer and use it in GitHub Desktop.
Save ebraraktas/30d0a24ce2fb391a22e68a30e2f9cdc1 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "tflite_slow_inference_2.4.0_conversion_bug.ipynb",
"provenance": [],
"collapsed_sections": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "Jzl22GNGbDt7"
},
"source": [
"# Slow inference on TF Lite>=2.4.0 converted model"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "L4RoUpQkjuk5"
},
"source": [
"Conversion of `Conv1D` layer seems to have changed in TF Lite `2.4.0`. Minimal example below demonstrates that inference on model converted by `tensoflow>=2.4.0` 27x slower compared to one converted by `tensoflow==2.4.0`.\n",
"\n",
"Steps to reproduce:\n",
"0. If you have models in the github issue, upload them to root and jump to step 4.\n",
"1. Run `Prepare` section and install tensorflow `2.3.0`. Restart kernel.\n",
"2. Run `Create Model` and `Conversion` section.\n",
"3. Repeat step 1 and 2 for tensorflow `2.6.0`\n",
"4. Run `Inference Test`\n",
"\n",
"### Previous Results:\n",
"\n",
"```\n",
"TF Runtime Version: 2.6.0\n",
"Model path: model_2.3.0.tflite\n",
"Test Duration: 0.5242369174957275\n",
"= = = = = = = = = = = = = = = = = = = = \n",
"TF Runtime Version: 2.6.0\n",
"Model path: model_2.3.0_quant.tflite\n",
"Test Duration: 0.7532312870025635\n",
"= = = = = = = = = = = = = = = = = = = = \n",
"TF Runtime Version: 2.6.0\n",
"Model path: model_2.6.0.tflite\n",
"Test Duration: 0.532163143157959\n",
"= = = = = = = = = = = = = = = = = = = = \n",
"TF Runtime Version: 2.6.0\n",
"Model path: model_2.6.0_quant.tflite\n",
"Test Duration: 18.914307594299316\n",
"= = = = = = = = = = = = = = = = = = = = \n",
"= = = = = = = = = = = = = = = = = = = = \n",
"TF Runtime Version: 2.3.0\n",
"Model path: model_2.3.0.tflite\n",
"Test Duration: 0.5360305309295654\n",
"= = = = = = = = = = = = = = = = = = = = \n",
"TF Runtime Version: 2.3.0\n",
"Model path: model_2.3.0_quant.tflite\n",
"Test Duration: 0.5518338680267334\n",
"= = = = = = = = = = = = = = = = = = = = \n",
"TF Runtime Version: 2.3.0\n",
"Model path: model_2.6.0.tflite\n",
"Test Duration: 0.5399584770202637\n",
"= = = = = = = = = = = = = = = = = = = = \n",
"TF Runtime Version: 2.3.0\n",
"Model path: model_2.6.0_quant.tflite\n",
"Cannot create Interpreter! Exception:\n",
"Didn't find op for builtin opcode 'CONV_2D' version '5'\n",
"Registration failed.\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PwQ5iEETanLq"
},
"source": [
"## Prepare"
]
},
{
"cell_type": "code",
"metadata": {
"id": "Dh5dvbl6arr5"
},
"source": [
"# Using TF Lite 2.3.0 for conversion creates much faster model\n",
"use_tf_2_3 = False \n",
"\n",
"if use_tf_2_3:\n",
" !pip install tensorflow==2.3.0\n",
"else:\n",
" # 2.5.0 and 2.4.0 performs same\n",
" !pip install tensorflow==2.6.0"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "l9vk-50PZkSj"
},
"source": [
"import tensorflow as tf\n",
"print(tf.__version__)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "XVyhL41jaIFs"
},
"source": [
"def generate_model(num_hidden_units: int = 500, num_layers: int = 1) -> tf.keras.models.Sequential:\n",
" model = tf.keras.models.Sequential()\n",
" for _ in range(num_layers):\n",
" model.add(tf.keras.layers.Conv1D(filters=num_hidden_units, kernel_size=3, strides=1, padding='SAME', activation='relu'))\n",
" optimizer = tf.keras.optimizers.Adam()\n",
" model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])\n",
" return model"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "7WdyoJQ6aj3Q"
},
"source": [
"## Create model"
]
},
{
"cell_type": "code",
"metadata": {
"id": "dh-uhJw6aKIU"
},
"source": [
"model = generate_model()\n",
"model.build(input_shape=(None, None, 80))"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "pCZubKlIafJb"
},
"source": [
"## Conversion"
]
},
{
"cell_type": "code",
"metadata": {
"id": "9sIGNMdraXW6"
},
"source": [
"for quantize in (True, False):\n",
" converter = tf.lite.TFLiteConverter.from_keras_model(model)\n",
"\n",
" if quantize:\n",
" converter.optimizations = [tf.lite.Optimize.DEFAULT]\n",
"\n",
" model_path = f\"model_{tf.__version__}{'_quant' if quantize else ''}.tflite\"\n",
" with open(model_path, \"wb\") as f:\n",
" f.write(converter.convert())"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "LdqKlzo0acqt"
},
"source": [
"## Inference test"
]
},
{
"cell_type": "code",
"metadata": {
"id": "wjkVCwgYoz4u"
},
"source": [
"import time, glob\n",
"data = tf.random.normal((1, 50, 80))\n",
"\n",
"for inference_model_path in sorted(glob.glob(\"model_*.tflite\")):\n",
" print(f\"TF Runtime Version: {tf.__version__}\")\n",
" print(f\"Model path: {inference_model_path}\")\n",
" try:\n",
" interpreter = tf.lite.Interpreter(inference_model_path)\n",
" except Exception as e:\n",
" print(\"Cannot create Interpreter! Exception:\")\n",
" print(e)\n",
" continue\n",
" interpreter.resize_tensor_input(0, [1, 50, 80])\n",
" interpreter.allocate_tensors()\n",
"\n",
" start = time.time()\n",
" for _ in range(1000):\n",
" interpreter.set_tensor(0, data)\n",
" interpreter.invoke()\n",
" print(\"Test Duration:\", time.time() - start)\n",
" print(\"= \" * 20)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "RomoJSTYfem2"
},
"source": [
""
],
"execution_count": null,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment