helena-intel/201-vision-monodepth-standalone.ipynb

## 201-vision-monodepth-standalone.ipynb
{"cells": [{"cell_type": "markdown", "id": "uniform-reservation", "metadata": {"id": "moved-collapse"}, "source": "# MONODEPTH on OpenVINO IR Model\n\nThis notebook demonstrates Monocular Depth Estimation with MidasNet in OpenVINO. Model information: https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/public/midasnet/midasnet.md"}, {"id": "41f4ef11", "cell_type": "markdown", "source": "## Preparation\n\nInstall the requirements and download the files that are necessary for running this notebook.\n\n**NOTE:** installation may take a while. It is recommended to restart the Jupyter kernel after installing the packages. Choose *Kernel->Restart Kernel* in Jupyter Notebook or Lab, or *Runtime->Restart runtime* in Google Colab.", "metadata": {}}, {"id": "43efbb46", "cell_type": "code", "metadata": {}, "execution_count": null, "source": "# Install or upgrade required Python packages. Install specific versions of some packages to ensure compatibility.\n!pip install openvino-dev opencv-python-headless==4.2.0.32 ipython>7.0 ipywidgets>=7.4", "outputs": []}, {"id": "63e85642", "cell_type": "code", "metadata": {}, "execution_count": null, "source": "# Download image and model files\nimport os\nimport pip\nimport urllib.parse\nimport urllib.request\nfrom pathlib import Path\n\nurls = ['https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/201-vision-monodepth/coco_bike.jpg', 'https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/201-vision-monodepth/monodepth.gif', 'https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/201-vision-monodepth/models/MiDaS_small.bin', 'https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/201-vision-monodepth/models/MiDaS_small.xml', 'https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/201-vision-monodepth/videos/Coco Walking in Berkeley.mp4']\n\nnotebook_url = \"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main\"\n\nfor url in urls:\n    save_path = Path(url).relative_to(fr\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/201-vision-monodepth\")\n    os.makedirs(save_path.parent, exist_ok=True)\n    safe_url = urllib.parse.quote(url, safe=\":/\")\n\n    urllib.request.urlretrieve(safe_url, save_path.as_posix())", "outputs": []}, {"cell_type": "markdown", "id": "described-cursor", "metadata": {}, "source": "<img src=\"monodepth.gif\">"}, {"cell_type": "markdown", "id": "growing-timber", "metadata": {}, "source": "### What is Monodepth?\nMonocular Depth Estimation is the task of estimating scene depth using a single image. It has many potential applications in robotics, 3D reconstruction, medical imaging and autonomous systems. For this demo, we use a neural network model called [MiDaS](https://github.com/intel-isl/MiDaS) which was developed by the Intelligent Systems Lab at Intel. Check out their research paper to learn more. \n\nR. Ranftl, K. Lasinger, D. Hafner, K. Schindler and V. Koltun, [\"Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer,\"](https://ieeexplore.ieee.org/document/9178977) in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2020.3019967."}, {"cell_type": "markdown", "id": "recovered-biology", "metadata": {"id": "creative-cisco"}, "source": "## Preparation "}, {"cell_type": "markdown", "id": "cordless-shelter", "metadata": {"id": "faced-honolulu"}, "source": "### Imports"}, {"cell_type": "code", "execution_count": null, "id": "bigger-worst", "metadata": {"id": "ahead-spider"}, "outputs": [], "source": "import os\nimport time\nimport urllib\nfrom pathlib import Path\n\nimport cv2\nimport matplotlib.cm\nimport matplotlib.pyplot as plt\nimport numpy as np\nfrom IPython.display import (\n    HTML,\n    FileLink,\n    Pretty,\n    ProgressBar,\n    Video,\n    clear_output,\n    display,\n)\nfrom openvino.inference_engine import IECore"}, {"cell_type": "markdown", "id": "technological-aberdeen", "metadata": {"id": "contained-office"}, "source": "### Settings"}, {"cell_type": "code", "execution_count": null, "id": "norman-introduction", "metadata": {"id": "amber-lithuania"}, "outputs": [], "source": "DEVICE = \"CPU\"\nMODEL_FILE = \"models/MiDaS_small.xml\"\n\nmodel_name = os.path.basename(MODEL_FILE)\nmodel_xml_path = Path(MODEL_FILE).with_suffix(\".xml\")"}, {"cell_type": "markdown", "id": "interim-scroll", "metadata": {}, "source": "## Functions"}, {"cell_type": "code", "execution_count": null, "id": "intermediate-syntax", "metadata": {"id": "endangered-constraint"}, "outputs": [], "source": "def normalize_minmax(data):\n    \"\"\"Normalizes the values in `data` between 0 and 1\"\"\"\n    return (data - data.min()) / (data.max() - data.min())"}, {"cell_type": "code", "execution_count": null, "id": "acknowledged-layer", "metadata": {}, "outputs": [], "source": "def load_image(path: str):\n    \"\"\"\n    Loads an image from `path` and returns it as BGR numpy array. `path`\n    should point to an image file, either a local filename or an url.\n    \"\"\"\n    if path.startswith(\"http\"):\n        # Set User-Agent to Mozilla because some websites block\n        # requests with User-Agent Python\n        request = urllib.request.Request(\n            path, headers={\"User-Agent\": \"Mozilla/5.0\"}\n        )\n        response = urllib.request.urlopen(request)\n        array = np.asarray(bytearray(response.read()), dtype=\"uint8\")\n        image = cv2.imdecode(array, -1)  # Loads the image as BGR\n    else:\n        image = cv2.imread(path)\n    return image"}, {"cell_type": "code", "execution_count": null, "id": "desperate-marina", "metadata": {}, "outputs": [], "source": "def convert_result_to_image(result, colormap=\"viridis\"):\n    \"\"\"\n    Convert network result of floating point numbers to an RGB image with\n    integer values from 0-255 by applying a colormap.\n\n    `result` is expected to be a single network result in 1,H,W shape\n    `colormap` is a matplotlib colormap.\n    See https://matplotlib.org/stable/tutorials/colors/colormaps.html\n    \"\"\"\n    cmap = matplotlib.cm.get_cmap(colormap)\n    result = result.squeeze(0)\n    result = normalize_minmax(result)\n    result = cmap(result)[:, :, :3] * 255\n    result = result.astype(np.uint8)\n    return result"}, {"cell_type": "markdown", "id": "polyphonic-image", "metadata": {"id": "sensitive-wagner"}, "source": "## Load model and get model information\n\nLoad the model in Inference Engine with `ie.read_network` and load it to the specified device with `ie.load_network`"}, {"cell_type": "code", "execution_count": null, "id": "exciting-collaboration", "metadata": {"id": "complete-brother"}, "outputs": [], "source": "ie = IECore()\nnet = ie.read_network(\n    str(model_xml_path), str(model_xml_path.with_suffix(\".bin\"))\n)\nexec_net = ie.load_network(network=net, device_name=DEVICE)\n\ninput_key = list(exec_net.input_info)[0]\noutput_key = list(exec_net.outputs.keys())[0]\n\nnetwork_input_shape = exec_net.input_info[input_key].tensor_desc.dims\nnetwork_image_height, network_image_width = network_input_shape[2:]"}, {"cell_type": "markdown", "id": "unusual-winner", "metadata": {"id": "compact-bargain"}, "source": "## Monodepth on Image\n\n### Load, resize and reshape input image\n\nThe input image is read with OpenCV, resized to network input size, and reshaped to (N,C,H,W) (H=height, W=width, C=number of channels, N=number of images). "}, {"cell_type": "code", "execution_count": null, "id": "postal-radar", "metadata": {"colab": {"base_uri": "https://localhost:8080/"}, "id": "central-psychology", "outputId": "d864ee96-3fbd-488d-da1a-88e730f34aad", "tags": []}, "outputs": [], "source": "IMAGE_FILE = \"coco_bike.jpg\"\nimage = load_image(IMAGE_FILE)\n# resize to input shape for network\nresized_image = cv2.resize(image, (network_image_height, network_image_width))\n# reshape image to network input shape NCHW\ninput_image = np.expand_dims(np.transpose(resized_image, (2, 0, 1)), 0)"}, {"cell_type": "markdown", "id": "manual-matter", "metadata": {"id": "taken-spanking"}, "source": "### Do inference on image\n\nDo the inference, convert the result to an image, and resize it to the original image shape"}, {"cell_type": "code", "execution_count": null, "id": "norman-circular", "metadata": {"id": "banner-kruger"}, "outputs": [], "source": "result = exec_net.infer(inputs={input_key: input_image})[output_key]\n# convert network result of disparity map to an image that shows\n# distance as colors\nresult_image = convert_result_to_image(result)\n# resize back to original image shape. cv2.resize expects shape\n# in (width, height), [::-1] reverses the (height, width) shape to match this.\nresult_image = cv2.resize(result_image, image.shape[:2][::-1])"}, {"cell_type": "markdown", "id": "desirable-groove", "metadata": {}, "source": "### Display monodepth image"}, {"cell_type": "code", "execution_count": null, "id": "wound-chance", "metadata": {"colab": {"base_uri": "https://localhost:8080/", "height": 867}, "id": "ranging-executive", "outputId": "30373e8e-34e9-4820-e32d-764aa99d4b25"}, "outputs": [], "source": "fig, ax = plt.subplots(1, 2, figsize=(20, 15))\nax[0].imshow(image[:, :, (2, 1, 0)])\nax[1].imshow(result_image);"}, {"cell_type": "markdown", "id": "weekly-ballot", "metadata": {"id": "descending-cache"}, "source": "## Monodepth on Video\n\nBy default, only the first 100 frames are processed, in order to quickly check that everything works. Change NUM_FRAMES in the cell below to modify this. Set NUM_FRAMES to 0 to process the whole video."}, {"cell_type": "markdown", "id": "distant-organizer", "metadata": {}, "source": "### Download and load video"}, {"cell_type": "code", "execution_count": null, "id": "taken-thompson", "metadata": {"colab": {"base_uri": "https://localhost:8080/"}, "id": "terminal-dividend", "outputId": "87f5ada0-8caf-49c3-fe54-626e2b1967f3"}, "outputs": [], "source": "# Video source: https://www.youtube.com/watch?v=fu1xcQdJRws (Public Domain)\nVIDEO_FILE = \"videos/Coco Walking in Berkeley.mp4\"\n# Number of video frames to process. Set to 0 to process all frames.\nNUM_FRAMES = 100\n# Create Path objects for the input video and the resulting video\nvideo_path = Path(VIDEO_FILE)\nresult_video_path = video_path.with_name(f\"{video_path.stem}_monodepth.mp4\")"}, {"cell_type": "code", "execution_count": null, "id": "convenient-adelaide", "metadata": {}, "outputs": [], "source": "cap = cv2.VideoCapture(str(video_path))\nret, image = cap.read()\nif not ret:\n    raise ValueError(f\"The video at {video_path} cannot be read.\")\nFPS = cap.get(cv2.CAP_PROP_FPS)\nFRAME_HEIGHT, FRAME_WIDTH = image.shape[:2]\n# The format to use for video encoding. VP90 is slow,\n# but it works on most systems.\n# Try the THEO encoding if you have FFMPEG installed.\nFOURCC = cv2.VideoWriter_fourcc(*\"VP90\")\n# FOURCC = cv2.VideoWriter_fourcc(*\"THEO\")\n\ncap.release()\nprint(\n    f\"The input video has a frame width of {FRAME_WIDTH}, \"\n    f\"frame height of {FRAME_HEIGHT} and runs at {FPS} fps\"\n)"}, {"cell_type": "markdown", "id": "spoken-seattle", "metadata": {}, "source": "### Do Inference on video and create monodepth video"}, {"cell_type": "code", "execution_count": null, "id": "opposed-energy", "metadata": {"colab": {"base_uri": "https://localhost:8080/"}, "id": "present-albany", "outputId": "600edb69-af12-44dc-ec8e-95005b74179c", "tags": []}, "outputs": [], "source": "frame_nr = 1\nstart_time = time.perf_counter()\ntotal_inference_duration = 0\n\ncap = cv2.VideoCapture(str(video_path))\nout_video = cv2.VideoWriter(\n    str(result_video_path),\n    FOURCC,\n    FPS,\n    (FRAME_WIDTH * 2, FRAME_HEIGHT),\n)\n\ntotal_frames = (\n    cap.get(cv2.CAP_PROP_FRAME_COUNT) if NUM_FRAMES == 0 else NUM_FRAMES\n)\nprogress_bar = ProgressBar(total=total_frames)\nprogress_bar.display()\n\ntry:\n    while cap.isOpened():\n        ret, image = cap.read()\n        if not ret:\n            cap.release()\n            break\n\n        if frame_nr == total_frames:\n            break\n\n        # Prepare frame for inference\n        # resize to input shape for network\n        resized_image = cv2.resize(\n            image, (network_image_height, network_image_width)\n        )\n        # reshape image to network input shape NCHW\n        input_image = np.expand_dims(np.transpose(resized_image, (2, 0, 1)), 0)\n\n        # Do inference\n        inference_start_time = time.perf_counter()\n        result = exec_net.infer(inputs={input_key: input_image})[output_key]\n        inference_stop_time = time.perf_counter()\n        inference_duration = inference_stop_time - inference_start_time\n        total_inference_duration += inference_duration\n\n        if frame_nr % 10 == 0:\n            clear_output(wait=True)\n            progress_bar.display()\n            display(\n                Pretty(\n                    f\"Processed frame {frame_nr}. \"\n                    f\"Inference time: {inference_duration:.2f} seconds \"\n                    f\"({1/inference_duration:.2f} FPS)\"\n                )\n            )\n\n        # Transform network result to RGB image\n        result_frame = convert_result_to_image(result)[:, :, (2, 1, 0)]\n        # Resize to original image shape\n        result_frame = cv2.resize(result_frame, (FRAME_WIDTH, FRAME_HEIGHT))\n        # Put image and result side by side\n        stacked_frame = np.hstack((image, result_frame))\n        # Save frame to video\n        out_video.write(stacked_frame)\n\n        frame_nr = frame_nr + 1\n        progress_bar.progress = frame_nr\n        progress_bar.update()\n\nexcept KeyboardInterrupt:\n    print(\"Processing interrupted.\")\nfinally:\n    out_video.release()\n    cap.release()\n    end_time = time.perf_counter()\n    duration = end_time - start_time\n    clear_output()\n    print(f\"Monodepth Video saved to '{str(result_video_path)}'.\")\n    print(\n        f\"Processed {frame_nr} frames in {duration:.2f} seconds. \"\n        f\"Total FPS (including video processing): {frame_nr/duration:.2f}.\"\n        f\"Inference FPS: {frame_nr/total_inference_duration:.2f} \"\n    )"}, {"cell_type": "markdown", "id": "periodic-sussex", "metadata": {"id": "bZ89ZI369KjA"}, "source": "### Display monodepth video"}, {"cell_type": "code", "execution_count": null, "id": "neither-version", "metadata": {"tags": []}, "outputs": [], "source": "video = Video(result_video_path, width=800, embed=True)\nif not result_video_path.exists():\n    plt.imshow(stacked_frame)\n    raise ValueError(\n        \"OpenCV was unable to write the video file. Showing one video frame.\"\n    )\nelse:\n    print(f\"Showing monodepth video saved at\\n{result_video_path.resolve()}\")\n    print(\n        \"If you cannot see the video in your browser, please click on the \"\n        \"following link to download the video \"\n    )\n    video_link = FileLink(result_video_path)\n    video_link.html_link_str = \"<a href='%s' download>%s</a>\"\n    display(HTML(video_link._repr_html_()))\n    display(video)"}], "metadata": {"colab": {"collapsed_sections": [], "name": "monodepth.ipynb", "provenance": [], "toc_visible": true}, "kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8"}}, "nbformat": 4, "nbformat_minor": 5}

## apt.txt
libpython3.7-dev