{"cells": [{"cell_type": "markdown", "id": "uniform-reservation", "metadata": {"id": "moved-collapse"}, "source": "# MONODEPTH on OpenVINO IR Model\n\nThis notebook demonstrates Monocular Depth Estimation with MidasNet in OpenVINO. Model information:"}, {"id": "41f4ef11", "cell_type": "markdown", "source": "## Preparation\n\nInstall the requirements and download the files that are necessary for running this notebook.\n\n**NOTE:** installation may take a while. It is recommended to restart the Jupyter kernel after installing the packages. Choose *Kernel->Restart Kernel* in Jupyter Notebook or Lab, or *Runtime->Restart runtime* in Google Colab.", "metadata": {}}, {"id": "43efbb46", "cell_type": "code", "metadata": {}, "execution_count": null, "source": "# Install or upgrade required Python packages. Install specific versions of some packages to ensure compatibility.\n!pip install openvino-dev opencv-python-headless== ipython>7.0 ipywidgets>=7.4", "outputs": []}, {"id": "63e85642", "cell_type": "code", "metadata": {}, "execution_count": null, "source": "# Download image and model files\nimport os\nimport pip\nimport urllib.parse\nimport urllib.request\nfrom pathlib import Path\n\nurls = ['', '', '', '', ' Walking in Berkeley.mp4']\n\nnotebook_url = \"\"\n\nfor url in urls:\n save_path = Path(url).relative_to(fr\"\")\n os.makedirs(save_path.parent, exist_ok=True)\n safe_url = urllib.parse.quote(url, safe=\":/\")\n\n urllib.request.urlretrieve(safe_url, save_path.as_posix())", "outputs": []}, {"cell_type": "markdown", "id": "described-cursor", "metadata": {}, "source": "<img src=\"monodepth.gif\">"}, {"cell_type": "markdown", "id": "growing-timber", "metadata": {}, "source": "### What is Monodepth?\nMonocular Depth Estimation is the task of estimating scene depth using a single image. It has many potential applications in robotics, 3D reconstruction, medical imaging and autonomous systems. For this demo, we use a neural network model called [MiDaS]( which was developed by the Intelligent Systems Lab at Intel. Check out their research paper to learn more. \n\nR. Ranftl, K. Lasinger, D. Hafner, K. Schindler and V. Koltun, [\"Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer,\"]( in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2020.3019967."}, {"cell_type": "markdown", "id": "recovered-biology", "metadata": {"id": "creative-cisco"}, "source": "## Preparation "}, {"cell_type": "markdown", "id": "cordless-shelter", "metadata": {"id": "faced-honolulu"}, "source": "### Imports"}, {"cell_type": "code", "execution_count": null, "id": "bigger-worst", "metadata": {"id": "ahead-spider"}, "outputs": [], "source": "import os\nimport time\nimport urllib\nfrom pathlib import Path\n\nimport cv2\nimport\nimport matplotlib.pyplot as plt\nimport numpy as np\nfrom IPython.display import (\n HTML,\n FileLink,\n Pretty,\n ProgressBar,\n Video,\n clear_output,\n display,\n)\nfrom openvino.inference_engine import IECore"}, {"cell_type": "markdown", "id": "technological-aberdeen", "metadata": {"id": "contained-office"}, "source": "### Settings"}, {"cell_type": "code", "execution_count": null, "id": "norman-introduction", "metadata": {"id": "amber-lithuania"}, "outputs": [], "source": "DEVICE = \"CPU\"\nMODEL_FILE = \"models/MiDaS_small.xml\"\n\nmodel_name = os.path.basename(MODEL_FILE)\nmodel_xml_path = Path(MODEL_FILE).with_suffix(\".xml\")"}, {"cell_type": "markdown", "id": "interim-scroll", "metadata": {}, "source": "## Functions"}, {"cell_type": "code", "execution_count": null, "id": "intermediate-syntax", "metadata": {"id": "endangered-constraint"}, "outputs": [], "source": "def normalize_minmax(data):\n \"\"\"Normalizes the values in `data` between 0 and 1\"\"\"\n return (data - data.min()) / (data.max() - data.min())"}, {"cell_type": "code", "execution_count": null, "id": "acknowledged-layer", "metadata": {}, "outputs": [], "source": "def load_image(path: str):\n \"\"\"\n Loads an image from `path` and returns it as BGR numpy array. `path`\n should point to an image file, either a local filename or an url.\n \"\"\"\n if path.startswith(\"http\"):\n # Set User-Agent to Mozilla because some websites block\n # requests with User-Agent Python\n request = urllib.request.Request(\n path, headers={\"User-Agent\": \"Mozilla/5.0\"}\n )\n response = urllib.request.urlopen(request)\n array = np.asarray(bytearray(, dtype=\"uint8\")\n image = cv2.imdecode(array, -1) # Loads the image as BGR\n else:\n image = cv2.imread(path)\n return image"}, {"cell_type": "code", "execution_count": null, "id": "desperate-marina", "metadata": {}, "outputs": [], "source": "def convert_result_to_image(result, colormap=\"viridis\"):\n \"\"\"\n Convert network result of floating point numbers to an RGB image with\n integer values from 0-255 by applying a colormap.\n\n `result` is expected to be a single network result in 1,H,W shape\n `colormap` is a matplotlib colormap.\n See\n \"\"\"\n cmap =\n result = result.squeeze(0)\n result = normalize_minmax(result)\n result = cmap(result)[:, :, :3] * 255\n result = result.astype(np.uint8)\n return result"}, {"cell_type": "markdown", "id": "polyphonic-image", "metadata": {"id": "sensitive-wagner"}, "source": "## Load model and get model information\n\nLoad the model in Inference Engine with `ie.read_network` and load it to the specified device with `ie.load_network`"}, {"cell_type": "code", "execution_count": null, "id": "exciting-collaboration", "metadata": {"id": "complete-brother"}, "outputs": [], "source": "ie = IECore()\nnet = ie.read_network(\n str(model_xml_path), str(model_xml_path.with_suffix(\".bin\"))\n)\nexec_net = ie.load_network(network=net, device_name=DEVICE)\n\ninput_key = list(exec_net.input_info)[0]\noutput_key = list(exec_net.outputs.keys())[0]\n\nnetwork_input_shape = exec_net.input_info[input_key].tensor_desc.dims\nnetwork_image_height, network_image_width = network_input_shape[2:]"}, {"cell_type": "markdown", "id": "unusual-winner", "metadata": {"id": "compact-bargain"}, "source": "## Monodepth on Image\n\n### Load, resize and reshape input image\n\nThe input image is read with OpenCV, resized to network input size, and reshaped to (N,C,H,W) (H=height, W=width, C=number of channels, N=number of images). "}, {"cell_type": "code", "execution_count": null, "id": "postal-radar", "metadata": {"colab": {"base_uri": "https://localhost:8080/"}, "id": "central-psychology", "outputId": "d864ee96-3fbd-488d-da1a-88e730f34aad", "tags": []}, "outputs": [], "source": "IMAGE_FILE = \"coco_bike.jpg\"\nimage = load_image(IMAGE_FILE)\n# resize to input shape for network\nresized_image = cv2.resize(image, (network_image_height, network_image_width))\n# reshape image to network input shape NCHW\ninput_image = np.expand_dims(np.transpose(resized_image, (2, 0, 1)), 0)"}, {"cell_type": "markdown", "id": "manual-matter", "metadata": {"id": "taken-spanking"}, "source": "### Do inference on image\n\nDo the inference, convert the result to an image, and resize it to the original image shape"}, {"cell_type": "code", "execution_count": null, "id": "norman-circular", "metadata": {"id": "banner-kruger"}, "outputs": [], "source": "result = exec_net.infer(inputs={input_key: input_image})[output_key]\n# convert network result of disparity map to an image that shows\n# distance as colors\nresult_image = convert_result_to_image(result)\n# resize back to original image shape. cv2.resize expects shape\n# in (width, height), [::-1] reverses the (height, width) shape to match this.\nresult_image = cv2.resize(result_image, image.shape[:2][::-1])"}, {"cell_type": "markdown", "id": "desirable-groove", "metadata": {}, "source": "### Display monodepth image"}, {"cell_type": "code", "execution_count": null, "id": "wound-chance", "metadata": {"colab": {"base_uri": "https://localhost:8080/", "height": 867}, "id": "ranging-executive", "outputId": "30373e8e-34e9-4820-e32d-764aa99d4b25"}, "outputs": [], "source": "fig, ax = plt.subplots(1, 2, figsize=(20, 15))\nax[0].imshow(image[:, :, (2, 1, 0)])\nax[1].imshow(result_image);"}, {"cell_type": "markdown", "id": "weekly-ballot", "metadata": {"id": "descending-cache"}, "source": "## Monodepth on Video\n\nBy default, only the first 100 frames are processed, in order to quickly check that everything works. Change NUM_FRAMES in the cell below to modify this. Set NUM_FRAMES to 0 to process the whole video."}, {"cell_type": "markdown", "id": "distant-organizer", "metadata": {}, "source": "### Download and load video"}, {"cell_type": "code", "execution_count": null, "id": "taken-thompson", "metadata": {"colab": {"base_uri": "https://localhost:8080/"}, "id": "terminal-dividend", "outputId": "87f5ada0-8caf-49c3-fe54-626e2b1967f3"}, "outputs": [], "source": "# Video source: (Public Domain)\nVIDEO_FILE = \"videos/Coco Walking in Berkeley.mp4\"\n# Number of video frames to process. Set to 0 to process all frames.\nNUM_FRAMES = 100\n# Create Path objects for the input video and the resulting video\nvideo_path = Path(VIDEO_FILE)\nresult_video_path = video_path.with_name(f\"{video_path.stem}_monodepth.mp4\")"}, {"cell_type": "code", "execution_count": null, "id": "convenient-adelaide", "metadata": {}, "outputs": [], "source": "cap = cv2.VideoCapture(str(video_path))\nret, image =\nif not ret:\n raise ValueError(f\"The video at {video_path} cannot be read.\")\nFPS = cap.get(cv2.CAP_PROP_FPS)\nFRAME_HEIGHT, FRAME_WIDTH = image.shape[:2]\n# The format to use for video encoding. VP90 is slow,\n# but it works on most systems.\n# Try the THEO encoding if you have FFMPEG installed.\nFOURCC = cv2.VideoWriter_fourcc(*\"VP90\")\n# FOURCC = cv2.VideoWriter_fourcc(*\"THEO\")\n\ncap.release()\nprint(\n f\"The input video has a frame width of {FRAME_WIDTH}, \"\n f\"frame height of {FRAME_HEIGHT} and runs at {FPS} fps\"\n)"}, {"cell_type": "markdown", "id": "spoken-seattle", "metadata": {}, "source": "### Do Inference on video and create monodepth video"}, {"cell_type": "code", "execution_count": null, "id": "opposed-energy", "metadata": {"colab": {"base_uri": "https://localhost:8080/"}, "id": "present-albany", "outputId": "600edb69-af12-44dc-ec8e-95005b74179c", "tags": []}, "outputs": [], "source": "frame_nr = 1\nstart_time = time.perf_counter()\ntotal_inference_duration = 0\n\ncap = cv2.VideoCapture(str(video_path))\nout_video = cv2.VideoWriter(\n str(result_video_path),\n FOURCC,\n FPS,\n (FRAME_WIDTH * 2, FRAME_HEIGHT),\n)\n\ntotal_frames = (\n cap.get(cv2.CAP_PROP_FRAME_COUNT) if NUM_FRAMES == 0 else NUM_FRAMES\n)\nprogress_bar = ProgressBar(total=total_frames)\nprogress_bar.display()\n\ntry:\n while cap.isOpened():\n ret, image =\n if not ret:\n cap.release()\n break\n\n if frame_nr == total_frames:\n break\n\n # Prepare frame for inference\n # resize to input shape for network\n resized_image = cv2.resize(\n image, (network_image_height, network_image_width)\n )\n # reshape image to network input shape NCHW\n input_image = np.expand_dims(np.transpose(resized_image, (2, 0, 1)), 0)\n\n # Do inference\n inference_start_time = time.perf_counter()\n result = exec_net.infer(inputs={input_key: input_image})[output_key]\n inference_stop_time = time.perf_counter()\n inference_duration = inference_stop_time - inference_start_time\n total_inference_duration += inference_duration\n\n if frame_nr % 10 == 0:\n clear_output(wait=True)\n progress_bar.display()\n display(\n Pretty(\n f\"Processed frame {frame_nr}. \"\n f\"Inference time: {inference_duration:.2f} seconds \"\n f\"({1/inference_duration:.2f} FPS)\"\n )\n )\n\n # Transform network result to RGB image\n result_frame = convert_result_to_image(result)[:, :, (2, 1, 0)]\n # Resize to original image shape\n result_frame = cv2.resize(result_frame, (FRAME_WIDTH, FRAME_HEIGHT))\n # Put image and result side by side\n stacked_frame = np.hstack((image, result_frame))\n # Save frame to video\n out_video.write(stacked_frame)\n\n frame_nr = frame_nr + 1\n progress_bar.progress = frame_nr\n progress_bar.update()\n\nexcept KeyboardInterrupt:\n print(\"Processing interrupted.\")\nfinally:\n out_video.release()\n cap.release()\n end_time = time.perf_counter()\n duration = end_time - start_time\n clear_output()\n print(f\"Monodepth Video saved to '{str(result_video_path)}'.\")\n print(\n f\"Processed {frame_nr} frames in {duration:.2f} seconds. \"\n f\"Total FPS (including video processing): {frame_nr/duration:.2f}.\"\n f\"Inference FPS: {frame_nr/total_inference_duration:.2f} \"\n )"}, {"cell_type": "markdown", "id": "periodic-sussex", "metadata": {"id": "bZ89ZI369KjA"}, "source": "### Display monodepth video"}, {"cell_type": "code", "execution_count": null, "id": "neither-version", "metadata": {"tags": []}, "outputs": [], "source": "video = Video(result_video_path, width=800, embed=True)\nif not result_video_path.exists():\n plt.imshow(stacked_frame)\n raise ValueError(\n \"OpenCV was unable to write the video file. Showing one video frame.\"\n )\nelse:\n print(f\"Showing monodepth video saved at\\n{result_video_path.resolve()}\")\n print(\n \"If you cannot see the video in your browser, please click on the \"\n \"following link to download the video \"\n )\n video_link = FileLink(result_video_path)\n video_link.html_link_str = \"<a href='%s' download>%s</a>\"\n display(HTML(video_link._repr_html_()))\n display(video)"}], "metadata": {"colab": {"collapsed_sections": [], "name": "monodepth.ipynb", "provenance": [], "toc_visible": true}, "kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8"}}, "nbformat": 4, "nbformat_minor": 5}
