Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save helena-intel/dd306d93503c70ab38d31982088924fa to your computer and use it in GitHub Desktop.
Save helena-intel/dd306d93503c70ab38d31982088924fa to your computer and use it in GitHub Desktop.
201-vision-monocular-depth-estimation
Display the source blob
Display the rendered blob
Raw
{"cells": [{"cell_type": "markdown", "id": "amino-disclosure", "metadata": {"id": "moved-collapse"}, "source": "# MONODEPTH on OpenVINO IR Model\n\nThis notebook demonstrates Monocular Depth Estimation with MidasNet in OpenVINO. Model information: https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/public/midasnet/midasnet.md\n\nTHIS IS A WORK IN PROGRESS NOTEBOOK. IT IS NOT FOR PUBLIC RELEASE. See the [README](README.md) for instructions on how to run this notebook on your own computer."}, {"id": "1a5425b6", "cell_type": "markdown", "source": "## Preparation\n\nInstall the requirements and download the files that are necessary for running this notebook.", "metadata": {}}, {"id": "6b3dd628", "cell_type": "code", "metadata": {}, "execution_count": null, "source": "# Install required Python packages\nimport pip\npip_arguments = None if pip.__version__ < '20.3' else ' --use-deprecated=legacy-resolver'\n!pip install $pip_argumentsopenvino-dev numpy==1.18.5 ipykernel==5.5.0 matplotlib opencv-python-headless==4.2.0.32", "outputs": []}, {"id": "ab7df814", "cell_type": "code", "metadata": {}, "execution_count": null, "source": "# Download image and model files\nimport os\nimport pip\nimport urllib.parse\nimport urllib.request\nfrom pathlib import Path\n\nurls = ['https://raw.githubusercontent.com/helena-intel/openvino-notebooks/develop/201-vision-monocular-depth-estimation/monodepth.gif', 'https://raw.githubusercontent.com/helena-intel/openvino-notebooks/develop/201-vision-monocular-depth-estimation/install_and_launch_monodepth.bat', 'https://raw.githubusercontent.com/helena-intel/openvino-notebooks/develop/201-vision-monocular-depth-estimation/requirements.txt', 'https://raw.githubusercontent.com/helena-intel/openvino-notebooks/develop/201-vision-monocular-depth-estimation/requirements-image.txt', 'https://raw.githubusercontent.com/helena-intel/openvino-notebooks/develop/201-vision-monocular-depth-estimation/coco_bike.jpg', 'https://raw.githubusercontent.com/helena-intel/openvino-notebooks/develop/201-vision-monocular-depth-estimation/videos/Coco Walking in Berkeley.mp4', 'https://raw.githubusercontent.com/helena-intel/openvino-notebooks/develop/201-vision-monocular-depth-estimation/models/MiDaS_small.bin', 'https://raw.githubusercontent.com/helena-intel/openvino-notebooks/develop/201-vision-monocular-depth-estimation/models/MiDaS_small.xml']\n\nfor url in urls:\n save_path = Path(url).relative_to(fr\"https:/raw.githubusercontent.com/helena-intel/openvino-notebooks/develop/201-vision-monocular-depth-estimation\")\n os.makedirs(save_path.parent, exist_ok=True)\n safe_url = urllib.parse.quote(url, safe=\":/\")\n\n urllib.request.urlretrieve(safe_url, save_path.as_posix())", "outputs": []}, {"cell_type": "markdown", "id": "determined-debut", "metadata": {}, "source": "<img src=\"monodepth.gif\">"}, {"cell_type": "markdown", "id": "fixed-biotechnology", "metadata": {}, "source": "### What is Monodepth?\nMonocular Depth Estimation is the task of estimating scene depth using a single image. It has many potential applications in robotics, 3D reconstruction, medical imaging and autonomous systems. For this demo, we use a neural network model called [MiDaS](https://github.com/intel-isl/MiDaS) which was developed by the Intelligent Systems Lab at Intel. Check out their research paper to learn more. \n\nR. Ranftl, K. Lasinger, D. Hafner, K. Schindler and V. Koltun, [\"Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer,\"](https://ieeexplore.ieee.org/document/9178977) in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2020.3019967."}, {"cell_type": "markdown", "id": "vulnerable-thread", "metadata": {"id": "creative-cisco"}, "source": "## Preparation "}, {"cell_type": "markdown", "id": "legitimate-timber", "metadata": {"id": "faced-honolulu"}, "source": "### Imports"}, {"cell_type": "code", "execution_count": null, "id": "placed-savage", "metadata": {"id": "ahead-spider"}, "outputs": [], "source": "import os\nimport time\nimport urllib\nfrom pathlib import Path\n\nimport cv2\nimport matplotlib.cm\nimport matplotlib.pyplot as plt\nimport numpy as np\nfrom IPython.display import FileLink, HTML, Pretty, ProgressBar, Video, clear_output, display\nfrom openvino.inference_engine import IECore"}, {"cell_type": "markdown", "id": "exposed-brush", "metadata": {"id": "contained-office"}, "source": "### Settings"}, {"cell_type": "code", "execution_count": null, "id": "essential-faith", "metadata": {"id": "amber-lithuania"}, "outputs": [], "source": "DEVICE = \"CPU\"\n# MODEL_URL = \"models/midasnet.xml\" # Larger model that is slower, but gives better results\nMODEL_FILE = \"models/MiDaS_small.xml\" # Small model that is fast and gives good results on some kinds of data\n\nmodel_name = os.path.basename(MODEL_FILE)\nmodel_xml_path = Path(MODEL_FILE).with_suffix(\".xml\")"}, {"cell_type": "markdown", "id": "wired-resistance", "metadata": {}, "source": "## Functions"}, {"cell_type": "code", "execution_count": null, "id": "acute-preview", "metadata": {"id": "endangered-constraint"}, "outputs": [], "source": "def normalize_minmax(data):\n \"\"\"Normalizes the values in `data` between 0 and 1\"\"\"\n return (data - data.min()) / (data.max() - data.min())"}, {"cell_type": "code", "execution_count": null, "id": "threatened-neutral", "metadata": {}, "outputs": [], "source": "def load_image(path: str):\n \"\"\"\n Loads an image from `path` and returns it as BGR numpy array. `path` should point to an image file,\n either a local filename or an url.\n \"\"\"\n if path.startswith(\"http\"):\n # Set User-Agent to Mozilla because some websites block requests with User-Agent Python\n request = urllib.request.Request(path, headers={\"User-Agent\": \"Mozilla/5.0\"})\n response = urllib.request.urlopen(request)\n array = np.asarray(bytearray(response.read()), dtype=\"uint8\")\n image = cv2.imdecode(array, -1) # Loads the image as BGR\n else:\n image = cv2.imread(path)\n return image"}, {"cell_type": "code", "execution_count": null, "id": "selective-annotation", "metadata": {}, "outputs": [], "source": "def convert_result_to_image(result, colormap=\"viridis\"):\n \"\"\"\n Convert network result of floating point numbers to an RGB image with integer values from 0-255\n by applying a colormap.\n\n `result` is expected to be a single network result in 1,H,W shape\n `colormap` is a matplotlib colormap. See https://matplotlib.org/stable/tutorials/colors/colormaps.html\n \"\"\"\n cmap = matplotlib.cm.get_cmap(colormap)\n result = result.squeeze(0)\n result = normalize_minmax(result)\n result = cmap(result)[:, :, :3] * 255\n result = result.astype(np.uint8)\n return result"}, {"cell_type": "markdown", "id": "damaged-checkout", "metadata": {"id": "sensitive-wagner"}, "source": "## Load model and get model information\n\nLoad the model in Inference Engine with `ie.read_network` and load it to the specified device with `ie.load_network`"}, {"cell_type": "code", "execution_count": null, "id": "differential-noise", "metadata": {"id": "complete-brother"}, "outputs": [], "source": "ie = IECore()\nnet = ie.read_network(str(model_xml_path), str(model_xml_path.with_suffix(\".bin\")))\nexec_net = ie.load_network(network=net, device_name=DEVICE)\n\ninput_key = list(exec_net.input_info)[0]\noutput_key = list(exec_net.outputs.keys())[0]\n\nnetwork_input_shape = exec_net.input_info[input_key].tensor_desc.dims\nnetwork_image_height, network_image_width = network_input_shape[2:]"}, {"cell_type": "markdown", "id": "vietnamese-casino", "metadata": {"id": "compact-bargain"}, "source": "## Monodepth on Image\n\n### Load, resize and reshape input image\n\nThe input image is read with OpenCV, resized to network input size, and reshaped to (N,C,H,W) (H=height, W=width, C=number of channels, N=number of images). "}, {"cell_type": "code", "execution_count": null, "id": "guilty-protein", "metadata": {"colab": {"base_uri": "https://localhost:8080/"}, "id": "central-psychology", "outputId": "d864ee96-3fbd-488d-da1a-88e730f34aad", "tags": []}, "outputs": [], "source": "# Download and load an image\n# Image source (CC license): https://storage.googleapis.com/openimages/web/visualizer/index.html?set=train&type=segmentation&r=false&c=%2Fm%2F02rgn06&id=470c2f96cb938855\nIMAGE_FILE = \"coco_bike.jpg\"\nimage = load_image(IMAGE_FILE)\nresized_image = cv2.resize(image, (network_image_height, network_image_width)) # resize to input shape for network\ninput_image = np.expand_dims(np.transpose(resized_image, (2, 0, 1)), 0) # reshape image to network input shape NCHW"}, {"cell_type": "markdown", "id": "great-karma", "metadata": {"id": "taken-spanking"}, "source": "### Do inference on image\n\nDo the inference, convert the result to an image, and resize it to the original image shape"}, {"cell_type": "code", "execution_count": null, "id": "loved-terrorism", "metadata": {"id": "banner-kruger"}, "outputs": [], "source": "result = exec_net.infer(inputs={input_key: input_image})[output_key]\n# convert network result of disparity map to an image that shows distance as colors\nresult_image = convert_result_to_image(result)\n# resize back to original image shape. cv2.resize expects shape in (width, height), [::-1] reverses the (height, width) shape to match this.\nresult_image = cv2.resize(result_image, image.shape[:2][::-1])"}, {"cell_type": "markdown", "id": "continuing-arbor", "metadata": {}, "source": "### Display monodepth image"}, {"cell_type": "code", "execution_count": null, "id": "experienced-couple", "metadata": {"colab": {"base_uri": "https://localhost:8080/", "height": 867}, "id": "ranging-executive", "outputId": "30373e8e-34e9-4820-e32d-764aa99d4b25"}, "outputs": [], "source": "fig, ax = plt.subplots(1, 2, figsize=(20, 15))\nax[0].imshow(image[:, :, (2, 1, 0)]) # (2,1,0) converts the image from BGR to RGB\nax[1].imshow(result_image);"}, {"cell_type": "markdown", "id": "republican-advice", "metadata": {"id": "descending-cache"}, "source": "## Monodepth on Video\n\nBy default, only the first 100 frames are processed, in order to quickly check that everything works. Change NUM_FRAMES in the cell below to modify this. Set NUM_FRAMES to 0 to process the whole video."}, {"cell_type": "markdown", "id": "academic-alberta", "metadata": {}, "source": "### Download and load video"}, {"cell_type": "code", "execution_count": null, "id": "mysterious-murder", "metadata": {"colab": {"base_uri": "https://localhost:8080/"}, "id": "terminal-dividend", "outputId": "87f5ada0-8caf-49c3-fe54-626e2b1967f3"}, "outputs": [], "source": "# Video source: https://www.youtube.com/watch?v=fu1xcQdJRws (Public Domain)\nVIDEO_FILE = \"videos/Coco Walking in Berkeley.mp4\"\nNUM_FRAMES = 100 # Number of video frames to process. Set to 0 to process all frames.\n# Create Path objects for the input video and the resulting video\nvideo_path = Path(VIDEO_FILE)\nresult_video_path = video_path.with_name(f\"{video_path.stem}_monodepth.mp4\")"}, {"cell_type": "code", "execution_count": null, "id": "flexible-bundle", "metadata": {}, "outputs": [], "source": "cap = cv2.VideoCapture(str(video_path))\nret, image = cap.read()\nif not ret:\n raise ValueError(f\"The video at {video_path} cannot be read.\")\nFPS = cap.get(cv2.CAP_PROP_FPS)\nFRAME_HEIGHT, FRAME_WIDTH = image.shape[:2]\n# The format to use for video encoding. VP90 is slow, but it works on most systems.\n# Try the THEO encoding if you have FFMPEG installed.\nFOURCC = cv2.VideoWriter_fourcc(*\"VP90\")\n# FOURCC = cv2.VideoWriter_fourcc(*\"THEO\")\n\ncap.release()\nprint(f\"The input video has a frame width of {FRAME_WIDTH}, frame height of {FRAME_HEIGHT} and runs at {FPS} fps\")"}, {"cell_type": "markdown", "id": "accompanied-herald", "metadata": {}, "source": "### Do Inference on video and create monodepth video"}, {"cell_type": "code", "execution_count": null, "id": "worthy-yugoslavia", "metadata": {"colab": {"base_uri": "https://localhost:8080/"}, "id": "present-albany", "outputId": "600edb69-af12-44dc-ec8e-95005b74179c"}, "outputs": [], "source": "frame_nr = 1\nstart_time = time.perf_counter()\ntotal_inference_duration = 0\n\ncap = cv2.VideoCapture(str(video_path))\nout_video = cv2.VideoWriter(\n str(result_video_path),\n FOURCC,\n FPS,\n (FRAME_WIDTH * 2, FRAME_HEIGHT),\n)\n\ntotal_frames = cap.get(cv2.CAP_PROP_FRAME_COUNT) if NUM_FRAMES == 0 else NUM_FRAMES\nprogress_bar = ProgressBar(total=total_frames)\nprogress_bar.display()\n\ntry:\n while cap.isOpened():\n ret, image = cap.read()\n if not ret:\n cap.release()\n break\n\n if frame_nr == total_frames:\n break\n\n # Prepare frame for inference\n resized_image = cv2.resize(image, (network_image_height, network_image_width)) # resize to input shape for network\n input_image = np.expand_dims(np.transpose(resized_image, (2, 0, 1)), 0) # reshape image to network input shape NCHW\n\n # Do inference\n inference_start_time = time.perf_counter()\n result = exec_net.infer(inputs={input_key: input_image})[output_key]\n inference_stop_time = time.perf_counter()\n inference_duration = inference_stop_time - inference_start_time\n total_inference_duration += inference_duration\n\n if frame_nr % 10 == 0:\n clear_output(wait=True)\n progress_bar.display()\n display(\n Pretty(f\"Processed frame {frame_nr}. Inference time: {inference_duration:.2f} seconds ({1/inference_duration:.2f} FPS)\")\n )\n\n # Transform network result to image\n result_frame = convert_result_to_image(result)[:, :, (2, 1, 0)] # Convert result from RGB to BGR\n # Resize to original image shape\n result_frame = cv2.resize(result_frame, (FRAME_WIDTH, FRAME_HEIGHT))\n # Put image and result side by side\n stacked_frame = np.hstack((image, result_frame))\n # Save frame to video\n out_video.write(stacked_frame)\n\n frame_nr = frame_nr + 1\n progress_bar.progress = frame_nr\n progress_bar.update()\n\nexcept KeyboardInterrupt:\n print(\"Processing interrupted.\")\nfinally:\n out_video.release()\n cap.release()\n end_time = time.perf_counter()\n duration = end_time - start_time\n clear_output()\n print(f\"Monodepth Video saved to '{str(result_video_path)}'.\")\n print(\n f\"Processed {frame_nr} frames in {duration:.2f} seconds. Total FPS (including video processing): {frame_nr/duration:.2f}. Inference FPS: {frame_nr/total_inference_duration:.2f} \"\n )"}, {"cell_type": "markdown", "id": "military-reggae", "metadata": {"id": "bZ89ZI369KjA"}, "source": "### Display monodepth video"}, {"cell_type": "code", "execution_count": null, "id": "intense-choir", "metadata": {}, "outputs": [], "source": "video = Video(result_video_path, width=800, embed=True)\nif not result_video_path.exists():\n plt.imshow(stacked_frame)\n raise ValueError(\"OpenCV was unable to write the video file. Showing one video frame.\")\nelse:\n print(f\"Showing monodepth video {result_video_path.resolve()}\")\n display(video)\n video_link = FileLink(result_video_path)\n display(HTML(f'If you cannot see the video in your browser, please right click on the following link to download the video {video_link._repr_html_()}'))\n"}], "metadata": {"colab": {"collapsed_sections": [], "name": "monodepth.ipynb", "provenance": [], "toc_visible": true}, "kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8"}}, "nbformat": 4, "nbformat_minor": 5}
libpython3.7-dev
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment