helena-intel/201-vision-monocular-depth-estimation-standalone.ipynb

## 201-vision-monocular-depth-estimation-standalone.ipynb
{"cells": [{"cell_type": "markdown", "id": "amino-disclosure", "metadata": {"id": "moved-collapse"}, "source": "# MONODEPTH on OpenVINO IR Model\n\nThis notebook demonstrates Monocular Depth Estimation with MidasNet in OpenVINO. Model information: https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/public/midasnet/midasnet.md\n\nTHIS IS A WORK IN PROGRESS NOTEBOOK. IT IS NOT FOR PUBLIC RELEASE. See the [README](README.md) for instructions on how to run this notebook on your own computer."}, {"id": "1a5425b6", "cell_type": "markdown", "source": "## Preparation\n\nInstall the requirements and download the files that are necessary for running this notebook.", "metadata": {}}, {"id": "6b3dd628", "cell_type": "code", "metadata": {}, "execution_count": null, "source": "# Install required Python packages\nimport pip\npip_arguments = None if pip.__version__ < '20.3' else ' --use-deprecated=legacy-resolver'\n!pip install $pip_argumentsopenvino-dev numpy==1.18.5 ipykernel==5.5.0 matplotlib opencv-python-headless==4.2.0.32", "outputs": []}, {"id": "ab7df814", "cell_type": "code", "metadata": {}, "execution_count": null, "source": "# Download image and model files\nimport os\nimport pip\nimport urllib.parse\nimport urllib.request\nfrom pathlib import Path\n\nurls = ['https://raw.githubusercontent.com/helena-intel/openvino-notebooks/develop/201-vision-monocular-depth-estimation/monodepth.gif', 'https://raw.githubusercontent.com/helena-intel/openvino-notebooks/develop/201-vision-monocular-depth-estimation/install_and_launch_monodepth.bat', 'https://raw.githubusercontent.com/helena-intel/openvino-notebooks/develop/201-vision-monocular-depth-estimation/requirements.txt', 'https://raw.githubusercontent.com/helena-intel/openvino-notebooks/develop/201-vision-monocular-depth-estimation/requirements-image.txt', 'https://raw.githubusercontent.com/helena-intel/openvino-notebooks/develop/201-vision-monocular-depth-estimation/coco_bike.jpg', 'https://raw.githubusercontent.com/helena-intel/openvino-notebooks/develop/201-vision-monocular-depth-estimation/videos/Coco Walking in Berkeley.mp4', 'https://raw.githubusercontent.com/helena-intel/openvino-notebooks/develop/201-vision-monocular-depth-estimation/models/MiDaS_small.bin', 'https://raw.githubusercontent.com/helena-intel/openvino-notebooks/develop/201-vision-monocular-depth-estimation/models/MiDaS_small.xml']\n\nfor url in urls:\n    save_path = Path(url).relative_to(fr\"https:/raw.githubusercontent.com/helena-intel/openvino-notebooks/develop/201-vision-monocular-depth-estimation\")\n    os.makedirs(save_path.parent, exist_ok=True)\n    safe_url = urllib.parse.quote(url, safe=\":/\")\n\n    urllib.request.urlretrieve(safe_url, save_path.as_posix())", "outputs": []}, {"cell_type": "markdown", "id": "determined-debut", "metadata": {}, "source": "<img src=\"monodepth.gif\">"}, {"cell_type": "markdown", "id": "fixed-biotechnology", "metadata": {}, "source": "### What is Monodepth?\nMonocular Depth Estimation is the task of estimating scene depth using a single image. It has many potential applications in robotics, 3D reconstruction, medical imaging and autonomous systems. For this demo, we use a neural network model called [MiDaS](https://github.com/intel-isl/MiDaS) which was developed by the Intelligent Systems Lab at Intel. Check out their research paper to learn more. \n\nR. Ranftl, K. Lasinger, D. Hafner, K. Schindler and V. Koltun, [\"Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer,\"](https://ieeexplore.ieee.org/document/9178977) in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2020.3019967."}, {"cell_type": "markdown", "id": "vulnerable-thread", "metadata": {"id": "creative-cisco"}, "source": "## Preparation "}, {"cell_type": "markdown", "id": "legitimate-timber", "metadata": {"id": "faced-honolulu"}, "source": "### Imports"}, {"cell_type": "code", "execution_count": null, "id": "placed-savage", "metadata": {"id": "ahead-spider"}, "outputs": [], "source": "import os\nimport time\nimport urllib\nfrom pathlib import Path\n\nimport cv2\nimport matplotlib.cm\nimport matplotlib.pyplot as plt\nimport numpy as np\nfrom IPython.display import FileLink, HTML, Pretty, ProgressBar, Video, clear_output, display\nfrom openvino.inference_engine import IECore"}, {"cell_type": "markdown", "id": "exposed-brush", "metadata": {"id": "contained-office"}, "source": "### Settings"}, {"cell_type": "code", "execution_count": null, "id": "essential-faith", "metadata": {"id": "amber-lithuania"}, "outputs": [], "source": "DEVICE = \"CPU\"\n# MODEL_URL = \"models/midasnet.xml\"  # Larger model that is slower, but gives better results\nMODEL_FILE = \"models/MiDaS_small.xml\"  # Small model that is fast and gives good results on some kinds of data\n\nmodel_name = os.path.basename(MODEL_FILE)\nmodel_xml_path = Path(MODEL_FILE).with_suffix(\".xml\")"}, {"cell_type": "markdown", "id": "wired-resistance", "metadata": {}, "source": "## Functions"}, {"cell_type": "code", "execution_count": null, "id": "acute-preview", "metadata": {"id": "endangered-constraint"}, "outputs": [], "source": "def normalize_minmax(data):\n    \"\"\"Normalizes the values in `data` between 0 and 1\"\"\"\n    return (data - data.min()) / (data.max() - data.min())"}, {"cell_type": "code", "execution_count": null, "id": "threatened-neutral", "metadata": {}, "outputs": [], "source": "def load_image(path: str):\n    \"\"\"\n    Loads an image from `path` and returns it as BGR numpy array. `path` should point to an image file,\n    either a local filename or an url.\n    \"\"\"\n    if path.startswith(\"http\"):\n        # Set User-Agent to Mozilla because some websites block requests with User-Agent Python\n        request = urllib.request.Request(path, headers={\"User-Agent\": \"Mozilla/5.0\"})\n        response = urllib.request.urlopen(request)\n        array = np.asarray(bytearray(response.read()), dtype=\"uint8\")\n        image = cv2.imdecode(array, -1)  # Loads the image as BGR\n    else:\n        image = cv2.imread(path)\n    return image"}, {"cell_type": "code", "execution_count": null, "id": "selective-annotation", "metadata": {}, "outputs": [], "source": "def convert_result_to_image(result, colormap=\"viridis\"):\n    \"\"\"\n    Convert network result of floating point numbers to an RGB image with integer values from 0-255\n    by applying a colormap.\n\n    `result` is expected to be a single network result in 1,H,W shape\n    `colormap` is a matplotlib colormap. See https://matplotlib.org/stable/tutorials/colors/colormaps.html\n    \"\"\"\n    cmap = matplotlib.cm.get_cmap(colormap)\n    result = result.squeeze(0)\n    result = normalize_minmax(result)\n    result = cmap(result)[:, :, :3] * 255\n    result = result.astype(np.uint8)\n    return result"}, {"cell_type": "markdown", "id": "damaged-checkout", "metadata": {"id": "sensitive-wagner"}, "source": "## Load model and get model information\n\nLoad the model in Inference Engine with `ie.read_network` and load it to the specified device with `ie.load_network`"}, {"cell_type": "code", "execution_count": null, "id": "differential-noise", "metadata": {"id": "complete-brother"}, "outputs": [], "source": "ie = IECore()\nnet = ie.read_network(str(model_xml_path), str(model_xml_path.with_suffix(\".bin\")))\nexec_net = ie.load_network(network=net, device_name=DEVICE)\n\ninput_key = list(exec_net.input_info)[0]\noutput_key = list(exec_net.outputs.keys())[0]\n\nnetwork_input_shape = exec_net.input_info[input_key].tensor_desc.dims\nnetwork_image_height, network_image_width = network_input_shape[2:]"}, {"cell_type": "markdown", "id": "vietnamese-casino", "metadata": {"id": "compact-bargain"}, "source": "## Monodepth on Image\n\n### Load, resize and reshape input image\n\nThe input image is read with OpenCV, resized to network input size, and reshaped to (N,C,H,W) (H=height, W=width, C=number of channels, N=number of images). "}, {"cell_type": "code", "execution_count": null, "id": "guilty-protein", "metadata": {"colab": {"base_uri": "https://localhost:8080/"}, "id": "central-psychology", "outputId": "d864ee96-3fbd-488d-da1a-88e730f34aad", "tags": []}, "outputs": [], "source": "# Download and load an image\n# Image source (CC license): https://storage.googleapis.com/openimages/web/visualizer/index.html?set=train&type=segmentation&r=false&c=%2Fm%2F02rgn06&id=470c2f96cb938855\nIMAGE_FILE = \"coco_bike.jpg\"\nimage = load_image(IMAGE_FILE)\nresized_image = cv2.resize(image, (network_image_height, network_image_width))  # resize to input shape for network\ninput_image = np.expand_dims(np.transpose(resized_image, (2, 0, 1)), 0)  # reshape image to network input shape NCHW"}, {"cell_type": "markdown", "id": "great-karma", "metadata": {"id": "taken-spanking"}, "source": "### Do inference on image\n\nDo the inference, convert the result to an image, and resize it to the original image shape"}, {"cell_type": "code", "execution_count": null, "id": "loved-terrorism", "metadata": {"id": "banner-kruger"}, "outputs": [], "source": "result = exec_net.infer(inputs={input_key: input_image})[output_key]\n# convert network result of disparity map to an image that shows distance as colors\nresult_image = convert_result_to_image(result)\n# resize back to original image shape. cv2.resize expects shape in (width, height), [::-1] reverses the (height, width) shape to match this.\nresult_image = cv2.resize(result_image, image.shape[:2][::-1])"}, {"cell_type": "markdown", "id": "continuing-arbor", "metadata": {}, "source": "### Display monodepth image"}, {"cell_type": "code", "execution_count": null, "id": "experienced-couple", "metadata": {"colab": {"base_uri": "https://localhost:8080/", "height": 867}, "id": "ranging-executive", "outputId": "30373e8e-34e9-4820-e32d-764aa99d4b25"}, "outputs": [], "source": "fig, ax = plt.subplots(1, 2, figsize=(20, 15))\nax[0].imshow(image[:, :, (2, 1, 0)])  # (2,1,0) converts the image from BGR to RGB\nax[1].imshow(result_image);"}, {"cell_type": "markdown", "id": "republican-advice", "metadata": {"id": "descending-cache"}, "source": "## Monodepth on Video\n\nBy default, only the first 100 frames are processed, in order to quickly check that everything works. Change NUM_FRAMES in the cell below to modify this. Set NUM_FRAMES to 0 to process the whole video."}, {"cell_type": "markdown", "id": "academic-alberta", "metadata": {}, "source": "### Download and load video"}, {"cell_type": "code", "execution_count": null, "id": "mysterious-murder", "metadata": {"colab": {"base_uri": "https://localhost:8080/"}, "id": "terminal-dividend", "outputId": "87f5ada0-8caf-49c3-fe54-626e2b1967f3"}, "outputs": [], "source": "# Video source: https://www.youtube.com/watch?v=fu1xcQdJRws (Public Domain)\nVIDEO_FILE = \"videos/Coco Walking in Berkeley.mp4\"\nNUM_FRAMES = 100  # Number of video frames to process. Set to 0 to process all frames.\n# Create Path objects for the input video and the resulting video\nvideo_path = Path(VIDEO_FILE)\nresult_video_path = video_path.with_name(f\"{video_path.stem}_monodepth.mp4\")"}, {"cell_type": "code", "execution_count": null, "id": "flexible-bundle", "metadata": {}, "outputs": [], "source": "cap = cv2.VideoCapture(str(video_path))\nret, image = cap.read()\nif not ret:\n    raise ValueError(f\"The video at {video_path} cannot be read.\")\nFPS = cap.get(cv2.CAP_PROP_FPS)\nFRAME_HEIGHT, FRAME_WIDTH = image.shape[:2]\n# The format to use for video encoding. VP90 is slow, but it works on most systems.\n# Try the THEO encoding if you have FFMPEG installed.\nFOURCC = cv2.VideoWriter_fourcc(*\"VP90\")\n# FOURCC = cv2.VideoWriter_fourcc(*\"THEO\")\n\ncap.release()\nprint(f\"The input video has a frame width of {FRAME_WIDTH}, frame height of {FRAME_HEIGHT} and runs at {FPS} fps\")"}, {"cell_type": "markdown", "id": "accompanied-herald", "metadata": {}, "source": "### Do Inference on video and create monodepth video"}, {"cell_type": "code", "execution_count": null, "id": "worthy-yugoslavia", "metadata": {"colab": {"base_uri": "https://localhost:8080/"}, "id": "present-albany", "outputId": "600edb69-af12-44dc-ec8e-95005b74179c"}, "outputs": [], "source": "frame_nr = 1\nstart_time = time.perf_counter()\ntotal_inference_duration = 0\n\ncap = cv2.VideoCapture(str(video_path))\nout_video = cv2.VideoWriter(\n    str(result_video_path),\n    FOURCC,\n    FPS,\n    (FRAME_WIDTH * 2, FRAME_HEIGHT),\n)\n\ntotal_frames = cap.get(cv2.CAP_PROP_FRAME_COUNT) if NUM_FRAMES == 0 else NUM_FRAMES\nprogress_bar = ProgressBar(total=total_frames)\nprogress_bar.display()\n\ntry:\n    while cap.isOpened():\n        ret, image = cap.read()\n        if not ret:\n            cap.release()\n            break\n\n        if frame_nr == total_frames:\n            break\n\n        # Prepare frame for inference\n        resized_image = cv2.resize(image, (network_image_height, network_image_width))  # resize to input shape for network\n        input_image = np.expand_dims(np.transpose(resized_image, (2, 0, 1)), 0)  # reshape image to network input shape NCHW\n\n        # Do inference\n        inference_start_time = time.perf_counter()\n        result = exec_net.infer(inputs={input_key: input_image})[output_key]\n        inference_stop_time = time.perf_counter()\n        inference_duration = inference_stop_time - inference_start_time\n        total_inference_duration += inference_duration\n\n        if frame_nr % 10 == 0:\n            clear_output(wait=True)\n            progress_bar.display()\n            display(\n                Pretty(f\"Processed frame {frame_nr}. Inference time: {inference_duration:.2f} seconds ({1/inference_duration:.2f} FPS)\")\n            )\n\n        # Transform network result to image\n        result_frame = convert_result_to_image(result)[:, :, (2, 1, 0)]  # Convert result from RGB to BGR\n        # Resize to original image shape\n        result_frame = cv2.resize(result_frame, (FRAME_WIDTH, FRAME_HEIGHT))\n        # Put image and result side by side\n        stacked_frame = np.hstack((image, result_frame))\n        # Save frame to video\n        out_video.write(stacked_frame)\n\n        frame_nr = frame_nr + 1\n        progress_bar.progress = frame_nr\n        progress_bar.update()\n\nexcept KeyboardInterrupt:\n    print(\"Processing interrupted.\")\nfinally:\n    out_video.release()\n    cap.release()\n    end_time = time.perf_counter()\n    duration = end_time - start_time\n    clear_output()\n    print(f\"Monodepth Video saved to '{str(result_video_path)}'.\")\n    print(\n        f\"Processed {frame_nr} frames in {duration:.2f} seconds. Total FPS (including video processing): {frame_nr/duration:.2f}. Inference FPS: {frame_nr/total_inference_duration:.2f} \"\n    )"}, {"cell_type": "markdown", "id": "military-reggae", "metadata": {"id": "bZ89ZI369KjA"}, "source": "### Display monodepth video"}, {"cell_type": "code", "execution_count": null, "id": "intense-choir", "metadata": {}, "outputs": [], "source": "video = Video(result_video_path, width=800, embed=True)\nif not result_video_path.exists():\n    plt.imshow(stacked_frame)\n    raise ValueError(\"OpenCV was unable to write the video file. Showing one video frame.\")\nelse:\n    print(f\"Showing monodepth video {result_video_path.resolve()}\")\n    display(video)\n    video_link = FileLink(result_video_path)\n    display(HTML(f'If you cannot see the video in your browser, please right click on the following link to download the video {video_link._repr_html_()}'))\n"}], "metadata": {"colab": {"collapsed_sections": [], "name": "monodepth.ipynb", "provenance": [], "toc_visible": true}, "kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8"}}, "nbformat": 4, "nbformat_minor": 5}

## apt.txt
libpython3.7-dev