Last active
April 25, 2021 17:54
-
-
Save helena-intel/98652c6f8d2f7b422fd2eea5273250d6 to your computer and use it in GitHub Desktop.
201-vision-monodepth
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{"cells": [{"cell_type": "markdown", "id": "uniform-reservation", "metadata": {"id": "moved-collapse"}, "source": "# MONODEPTH on OpenVINO IR Model\n\nThis notebook demonstrates Monocular Depth Estimation with MidasNet in OpenVINO. Model information: https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/public/midasnet/midasnet.md"}, {"id": "41f4ef11", "cell_type": "markdown", "source": "## Preparation\n\nInstall the requirements and download the files that are necessary for running this notebook.\n\n**NOTE:** installation may take a while. It is recommended to restart the Jupyter kernel after installing the packages. Choose *Kernel->Restart Kernel* in Jupyter Notebook or Lab, or *Runtime->Restart runtime* in Google Colab.", "metadata": {}}, {"id": "43efbb46", "cell_type": "code", "metadata": {}, "execution_count": null, "source": "# Install or upgrade required Python packages. Install specific versions of some packages to ensure compatibility.\n!pip install openvino-dev opencv-python-headless==4.2.0.32 ipython>7.0 ipywidgets>=7.4", "outputs": []}, {"id": "63e85642", "cell_type": "code", "metadata": {}, "execution_count": null, "source": "# Download image and model files\nimport os\nimport pip\nimport urllib.parse\nimport urllib.request\nfrom pathlib import Path\n\nurls = ['https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/201-vision-monodepth/coco_bike.jpg', 'https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/201-vision-monodepth/monodepth.gif', 'https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/201-vision-monodepth/models/MiDaS_small.bin', 'https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/201-vision-monodepth/models/MiDaS_small.xml', 'https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/201-vision-monodepth/videos/Coco Walking in Berkeley.mp4']\n\nnotebook_url = \"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main\"\n\nfor url in urls:\n save_path = Path(url).relative_to(fr\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/201-vision-monodepth\")\n os.makedirs(save_path.parent, exist_ok=True)\n safe_url = urllib.parse.quote(url, safe=\":/\")\n\n urllib.request.urlretrieve(safe_url, save_path.as_posix())", "outputs": []}, {"cell_type": "markdown", "id": "described-cursor", "metadata": {}, "source": "<img src=\"monodepth.gif\">"}, {"cell_type": "markdown", "id": "growing-timber", "metadata": {}, "source": "### What is Monodepth?\nMonocular Depth Estimation is the task of estimating scene depth using a single image. It has many potential applications in robotics, 3D reconstruction, medical imaging and autonomous systems. For this demo, we use a neural network model called [MiDaS](https://github.com/intel-isl/MiDaS) which was developed by the Intelligent Systems Lab at Intel. Check out their research paper to learn more. \n\nR. Ranftl, K. Lasinger, D. Hafner, K. Schindler and V. Koltun, [\"Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer,\"](https://ieeexplore.ieee.org/document/9178977) in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2020.3019967."}, {"cell_type": "markdown", "id": "recovered-biology", "metadata": {"id": "creative-cisco"}, "source": "## Preparation "}, {"cell_type": "markdown", "id": "cordless-shelter", "metadata": {"id": "faced-honolulu"}, "source": "### Imports"}, {"cell_type": "code", "execution_count": null, "id": "bigger-worst", "metadata": {"id": "ahead-spider"}, "outputs": [], "source": "import os\nimport time\nimport urllib\nfrom pathlib import Path\n\nimport cv2\nimport matplotlib.cm\nimport matplotlib.pyplot as plt\nimport numpy as np\nfrom IPython.display import (\n HTML,\n FileLink,\n Pretty,\n ProgressBar,\n Video,\n clear_output,\n display,\n)\nfrom openvino.inference_engine import IECore"}, {"cell_type": "markdown", "id": "technological-aberdeen", "metadata": {"id": "contained-office"}, "source": "### Settings"}, {"cell_type": "code", "execution_count": null, "id": "norman-introduction", "metadata": {"id": "amber-lithuania"}, "outputs": [], "source": "DEVICE = \"CPU\"\nMODEL_FILE = \"models/MiDaS_small.xml\"\n\nmodel_name = os.path.basename(MODEL_FILE)\nmodel_xml_path = Path(MODEL_FILE).with_suffix(\".xml\")"}, {"cell_type": "markdown", "id": "interim-scroll", "metadata": {}, "source": "## Functions"}, {"cell_type": "code", "execution_count": null, "id": "intermediate-syntax", "metadata": {"id": "endangered-constraint"}, "outputs": [], "source": "def normalize_minmax(data):\n \"\"\"Normalizes the values in `data` between 0 and 1\"\"\"\n return (data - data.min()) / (data.max() - data.min())"}, {"cell_type": "code", "execution_count": null, "id": "acknowledged-layer", "metadata": {}, "outputs": [], "source": "def load_image(path: str):\n \"\"\"\n Loads an image from `path` and returns it as BGR numpy array. `path`\n should point to an image file, either a local filename or an url.\n \"\"\"\n if path.startswith(\"http\"):\n # Set User-Agent to Mozilla because some websites block\n # requests with User-Agent Python\n request = urllib.request.Request(\n path, headers={\"User-Agent\": \"Mozilla/5.0\"}\n )\n response = urllib.request.urlopen(request)\n array = np.asarray(bytearray(response.read()), dtype=\"uint8\")\n image = cv2.imdecode(array, -1) # Loads the image as BGR\n else:\n image = cv2.imread(path)\n return image"}, {"cell_type": "code", "execution_count": null, "id": "desperate-marina", "metadata": {}, "outputs": [], "source": "def convert_result_to_image(result, colormap=\"viridis\"):\n \"\"\"\n Convert network result of floating point numbers to an RGB image with\n integer values from 0-255 by applying a colormap.\n\n `result` is expected to be a single network result in 1,H,W shape\n `colormap` is a matplotlib colormap.\n See https://matplotlib.org/stable/tutorials/colors/colormaps.html\n \"\"\"\n cmap = matplotlib.cm.get_cmap(colormap)\n result = result.squeeze(0)\n result = normalize_minmax(result)\n result = cmap(result)[:, :, :3] * 255\n result = result.astype(np.uint8)\n return result"}, {"cell_type": "markdown", "id": "polyphonic-image", "metadata": {"id": "sensitive-wagner"}, "source": "## Load model and get model information\n\nLoad the model in Inference Engine with `ie.read_network` and load it to the specified device with `ie.load_network`"}, {"cell_type": "code", "execution_count": null, "id": "exciting-collaboration", "metadata": {"id": "complete-brother"}, "outputs": [], "source": "ie = IECore()\nnet = ie.read_network(\n str(model_xml_path), str(model_xml_path.with_suffix(\".bin\"))\n)\nexec_net = ie.load_network(network=net, device_name=DEVICE)\n\ninput_key = list(exec_net.input_info)[0]\noutput_key = list(exec_net.outputs.keys())[0]\n\nnetwork_input_shape = exec_net.input_info[input_key].tensor_desc.dims\nnetwork_image_height, network_image_width = network_input_shape[2:]"}, {"cell_type": "markdown", "id": "unusual-winner", "metadata": {"id": "compact-bargain"}, "source": "## Monodepth on Image\n\n### Load, resize and reshape input image\n\nThe input image is read with OpenCV, resized to network input size, and reshaped to (N,C,H,W) (H=height, W=width, C=number of channels, N=number of images). "}, {"cell_type": "code", "execution_count": null, "id": "postal-radar", "metadata": {"colab": {"base_uri": "https://localhost:8080/"}, "id": "central-psychology", "outputId": "d864ee96-3fbd-488d-da1a-88e730f34aad", "tags": []}, "outputs": [], "source": "IMAGE_FILE = \"coco_bike.jpg\"\nimage = load_image(IMAGE_FILE)\n# resize to input shape for network\nresized_image = cv2.resize(image, (network_image_height, network_image_width))\n# reshape image to network input shape NCHW\ninput_image = np.expand_dims(np.transpose(resized_image, (2, 0, 1)), 0)"}, {"cell_type": "markdown", "id": "manual-matter", "metadata": {"id": "taken-spanking"}, "source": "### Do inference on image\n\nDo the inference, convert the result to an image, and resize it to the original image shape"}, {"cell_type": "code", "execution_count": null, "id": "norman-circular", "metadata": {"id": "banner-kruger"}, "outputs": [], "source": "result = exec_net.infer(inputs={input_key: input_image})[output_key]\n# convert network result of disparity map to an image that shows\n# distance as colors\nresult_image = convert_result_to_image(result)\n# resize back to original image shape. cv2.resize expects shape\n# in (width, height), [::-1] reverses the (height, width) shape to match this.\nresult_image = cv2.resize(result_image, image.shape[:2][::-1])"}, {"cell_type": "markdown", "id": "desirable-groove", "metadata": {}, "source": "### Display monodepth image"}, {"cell_type": "code", "execution_count": null, "id": "wound-chance", "metadata": {"colab": {"base_uri": "https://localhost:8080/", "height": 867}, "id": "ranging-executive", "outputId": "30373e8e-34e9-4820-e32d-764aa99d4b25"}, "outputs": [], "source": "fig, ax = plt.subplots(1, 2, figsize=(20, 15))\nax[0].imshow(image[:, :, (2, 1, 0)])\nax[1].imshow(result_image);"}, {"cell_type": "markdown", "id": "weekly-ballot", "metadata": {"id": "descending-cache"}, "source": "## Monodepth on Video\n\nBy default, only the first 100 frames are processed, in order to quickly check that everything works. Change NUM_FRAMES in the cell below to modify this. Set NUM_FRAMES to 0 to process the whole video."}, {"cell_type": "markdown", "id": "distant-organizer", "metadata": {}, "source": "### Download and load video"}, {"cell_type": "code", "execution_count": null, "id": "taken-thompson", "metadata": {"colab": {"base_uri": "https://localhost:8080/"}, "id": "terminal-dividend", "outputId": "87f5ada0-8caf-49c3-fe54-626e2b1967f3"}, "outputs": [], "source": "# Video source: https://www.youtube.com/watch?v=fu1xcQdJRws (Public Domain)\nVIDEO_FILE = \"videos/Coco Walking in Berkeley.mp4\"\n# Number of video frames to process. Set to 0 to process all frames.\nNUM_FRAMES = 100\n# Create Path objects for the input video and the resulting video\nvideo_path = Path(VIDEO_FILE)\nresult_video_path = video_path.with_name(f\"{video_path.stem}_monodepth.mp4\")"}, {"cell_type": "code", "execution_count": null, "id": "convenient-adelaide", "metadata": {}, "outputs": [], "source": "cap = cv2.VideoCapture(str(video_path))\nret, image = cap.read()\nif not ret:\n raise ValueError(f\"The video at {video_path} cannot be read.\")\nFPS = cap.get(cv2.CAP_PROP_FPS)\nFRAME_HEIGHT, FRAME_WIDTH = image.shape[:2]\n# The format to use for video encoding. VP90 is slow,\n# but it works on most systems.\n# Try the THEO encoding if you have FFMPEG installed.\nFOURCC = cv2.VideoWriter_fourcc(*\"VP90\")\n# FOURCC = cv2.VideoWriter_fourcc(*\"THEO\")\n\ncap.release()\nprint(\n f\"The input video has a frame width of {FRAME_WIDTH}, \"\n f\"frame height of {FRAME_HEIGHT} and runs at {FPS} fps\"\n)"}, {"cell_type": "markdown", "id": "spoken-seattle", "metadata": {}, "source": "### Do Inference on video and create monodepth video"}, {"cell_type": "code", "execution_count": null, "id": "opposed-energy", "metadata": {"colab": {"base_uri": "https://localhost:8080/"}, "id": "present-albany", "outputId": "600edb69-af12-44dc-ec8e-95005b74179c", "tags": []}, "outputs": [], "source": "frame_nr = 1\nstart_time = time.perf_counter()\ntotal_inference_duration = 0\n\ncap = cv2.VideoCapture(str(video_path))\nout_video = cv2.VideoWriter(\n str(result_video_path),\n FOURCC,\n FPS,\n (FRAME_WIDTH * 2, FRAME_HEIGHT),\n)\n\ntotal_frames = (\n cap.get(cv2.CAP_PROP_FRAME_COUNT) if NUM_FRAMES == 0 else NUM_FRAMES\n)\nprogress_bar = ProgressBar(total=total_frames)\nprogress_bar.display()\n\ntry:\n while cap.isOpened():\n ret, image = cap.read()\n if not ret:\n cap.release()\n break\n\n if frame_nr == total_frames:\n break\n\n # Prepare frame for inference\n # resize to input shape for network\n resized_image = cv2.resize(\n image, (network_image_height, network_image_width)\n )\n # reshape image to network input shape NCHW\n input_image = np.expand_dims(np.transpose(resized_image, (2, 0, 1)), 0)\n\n # Do inference\n inference_start_time = time.perf_counter()\n result = exec_net.infer(inputs={input_key: input_image})[output_key]\n inference_stop_time = time.perf_counter()\n inference_duration = inference_stop_time - inference_start_time\n total_inference_duration += inference_duration\n\n if frame_nr % 10 == 0:\n clear_output(wait=True)\n progress_bar.display()\n display(\n Pretty(\n f\"Processed frame {frame_nr}. \"\n f\"Inference time: {inference_duration:.2f} seconds \"\n f\"({1/inference_duration:.2f} FPS)\"\n )\n )\n\n # Transform network result to RGB image\n result_frame = convert_result_to_image(result)[:, :, (2, 1, 0)]\n # Resize to original image shape\n result_frame = cv2.resize(result_frame, (FRAME_WIDTH, FRAME_HEIGHT))\n # Put image and result side by side\n stacked_frame = np.hstack((image, result_frame))\n # Save frame to video\n out_video.write(stacked_frame)\n\n frame_nr = frame_nr + 1\n progress_bar.progress = frame_nr\n progress_bar.update()\n\nexcept KeyboardInterrupt:\n print(\"Processing interrupted.\")\nfinally:\n out_video.release()\n cap.release()\n end_time = time.perf_counter()\n duration = end_time - start_time\n clear_output()\n print(f\"Monodepth Video saved to '{str(result_video_path)}'.\")\n print(\n f\"Processed {frame_nr} frames in {duration:.2f} seconds. \"\n f\"Total FPS (including video processing): {frame_nr/duration:.2f}.\"\n f\"Inference FPS: {frame_nr/total_inference_duration:.2f} \"\n )"}, {"cell_type": "markdown", "id": "periodic-sussex", "metadata": {"id": "bZ89ZI369KjA"}, "source": "### Display monodepth video"}, {"cell_type": "code", "execution_count": null, "id": "neither-version", "metadata": {"tags": []}, "outputs": [], "source": "video = Video(result_video_path, width=800, embed=True)\nif not result_video_path.exists():\n plt.imshow(stacked_frame)\n raise ValueError(\n \"OpenCV was unable to write the video file. Showing one video frame.\"\n )\nelse:\n print(f\"Showing monodepth video saved at\\n{result_video_path.resolve()}\")\n print(\n \"If you cannot see the video in your browser, please click on the \"\n \"following link to download the video \"\n )\n video_link = FileLink(result_video_path)\n video_link.html_link_str = \"<a href='%s' download>%s</a>\"\n display(HTML(video_link._repr_html_()))\n display(video)"}], "metadata": {"colab": {"collapsed_sections": [], "name": "monodepth.ipynb", "provenance": [], "toc_visible": true}, "kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8"}}, "nbformat": 4, "nbformat_minor": 5} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
libpython3.7-dev |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment