Skip to content

Instantly share code, notes, and snippets.

@bazsapeter
Created February 3, 2020 21:46
Show Gist options
  • Save bazsapeter/d324227fc119e198a8910b89c6024c30 to your computer and use it in GitHub Desktop.
Save bazsapeter/d324227fc119e198a8910b89c6024c30 to your computer and use it in GitHub Desktop.
Created on Cognitive Class Labs
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a href=\"https://cognitiveclass.ai/\">\n",
" <img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/CV0101/Logo/SNLogo.png\" width=\"200\" align=\"center\">\n",
"</a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h1>Lab - Detecting text in images with OpenCV and Tesseract OCR</h1>\n",
"\n",
"<b>Welcome!</b> In this lab you will learn about about Automatic number-plate recognition. We will use the Tesseract OCR An Optical Character Recognition Engine (OCR Engine) to automatically recognize text in vehicle registration plates. Pretty cool, right? \n",
"\n",
"After completing this lab you will:\n",
"<h5> 1. Learn to download, read and display images using Python, OpenCV and Matplotlib </h5>\n",
"<h5> 2. Learn to use Tesseract OCR for detetcing text in images</h5>\n",
"<h5> 3. Learn about image processing techniques </h5>\n",
"<h5> 4. Compress images using a technique called K Means Clustering</h5>\n",
"<h5> 5. Compress images by up to 90% ! </h5>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-info\" style=\"margin-top: 20px\">\n",
" <br>\n",
" <br>\n",
" <h2>Table of Contents</h2>\n",
" <ul>\n",
" <li><a href=\"#ref0\">Tesseract</a></li>\n",
" <li><a href=\"#ref1\">Perform OCR using the Tesseract Engine on license plates</a></li>\n",
" <li><a href=\"#ref2\">Image Processing Techniques</a></li>\n",
" <li><a href=\"#ref4\">Exercises</a></li>\n",
" </ul>\n",
" <br>\n",
" <p>Estimated Time Needed: <strong>1 hr</strong></p>\n",
" </div>\n",
"<hr>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"ref0\"></a>\n",
"<h2 align=\"center\">Tesseract<a href=\"http://opensource.google.com/projects/tesseract\"> Homepage</a></h2>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Tesseract is an Optical Character Recognition (OCR) Engine open-sourced and supported by Google\n",
"- Has ability to recognize more than 100 languages out of the box"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### We will use Tesseract for recognizing images of License Plates. These images were clicked under different lighting conditions and have a variety of variations in them. Pretty cool, right?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To view the license plates, you could click on the `license_plates` folder which is listed in the files directory in the left-sidebar of the JupyterLab environment. If this side menu is hidden, you can go to `View`>`View Left-Sidebar`. Once you click on the folder, you could click on each file name to view the image of the license plate"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Install the Pytesseract package for using the Tesseract Engine with Python"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Collecting pytesseract\n",
" Downloading https://files.pythonhosted.org/packages/df/4e/42c54b4344cbcb392d949ffb0b1c1e95f03ceaa6a354c8d3aafcd470592e/pytesseract-0.3.2.tar.gz\n",
"Collecting wget\n",
" Downloading https://files.pythonhosted.org/packages/47/6a/62e288da7bcda82b935ff0c6cfe542970f04e29c756b0e147251b2fb251f/wget-3.2.zip\n",
"Requirement already satisfied, skipping upgrade: Pillow in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from pytesseract) (6.2.1)\n",
"Building wheels for collected packages: pytesseract, wget\n",
" Building wheel for pytesseract (setup.py) ... \u001b[?25ldone\n",
"\u001b[?25h Stored in directory: /home/jupyterlab/.cache/pip/wheels/c2/60/55/ec507bce8e8ccb516954accf661ee60c8b34198fafdfb81872\n",
" Building wheel for wget (setup.py) ... \u001b[?25ldone\n",
"\u001b[?25h Stored in directory: /home/jupyterlab/.cache/pip/wheels/40/15/30/7d8f7cea2902b4db79e3fea550d7d7b85ecb27ef992b618f3f\n",
"Successfully built pytesseract wget\n",
"Installing collected packages: pytesseract, wget\n",
"Successfully installed pytesseract-0.3.2 wget-3.2\n"
]
}
],
"source": [
"!pip install --upgrade pytesseract wget"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Loading the required python modules\n",
"import pytesseract\n",
"import matplotlib.pyplot as plt\n",
"import cv2\n",
"import glob\n",
"import os"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"ref1\"></a>\n",
"<h2 align=\"center\">Perform OCR using the Tesseract Engine on license plates</h2>"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"import wget, zipfile, os\n",
"\n",
"filename='license-plates'\n",
"\n",
"if not os.path.isfile(filename):\n",
" filename = wget.download('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/CV0101/Dataset/license-plates.zip')\n",
" with zipfile.ZipFile(\"license-plates.zip\",\"r\") as zip_ref:\n",
" zip_ref.extractall()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"path_for_license_plates = os.getcwd() + \"/license-plates/**/*.jpg\"\n",
"list_license_plates = []\n",
"predicted_license_plates = []\n",
"\n",
"for path_to_license_plate in glob.glob(path_for_license_plates, recursive=True):\n",
" \n",
" license_plate_file = path_to_license_plate.split(\"/\")[-1]\n",
" license_plate, _ = os.path.splitext(license_plate_file)\n",
" '''\n",
" Here we append the actual license plate to a list\n",
" '''\n",
" list_license_plates.append(license_plate)\n",
" \n",
" '''\n",
" Read each license plate image file using openCV\n",
" '''\n",
" img = cv2.imread(path_to_license_plate)\n",
" \n",
" '''\n",
" We then pass each license plate image file to the Tesseract OCR engine using \n",
" the Python library wrapper for it. We get back a predicted_result for the license plate.\n",
" We append the predicted_result in a list and compare it with the original the license plate\n",
" '''\n",
" predicted_result = pytesseract.image_to_string(img, lang='eng',\n",
" config='--oem 3 --psm 6 -c tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')\n",
" \n",
" filter_predicted_result = \"\".join(predicted_result.split()).replace(\":\", \"\").replace(\"-\", \"\")\n",
" predicted_license_plates.append(filter_predicted_result)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Actual License Plate \t Predicted License Plate \t Accuracy\n",
"-------------------- \t ----------------------- \t --------\n",
" OYJ9557 \t\t\t OYJ9557 \t\t 100%\n",
" PJG0783 \t\t\t PJG0783 \t\t 100%\n",
" OUP9563 \t\t\t OUP9563 \t\t 100%\n",
" OLC4728 \t\t\t OLC4728 \t\t 100%\n",
" ODJ1599 \t\t\t ODJ1599 \t\t 100%\n",
" GWT2180 \t\t\t GWT2120 \t\t 86.0%\n",
" OKV8004 \t\t\t QKV8004 \t\t 86.0%\n",
" PJB2414 \t\t\t PJB2414 \t\t 100%\n",
" OLA1208 \t\t\t OLA1208 \t\t 100%\n",
" AYO9034 \t\t\t AYO9034 \t\t 100%\n",
" JSQ1413 \t\t\t JSQ|413 \t\t 86.0%\n",
" OKS0078 \t\t\t OKS0078 \t\t 100%\n",
" NTK5785 \t\t\t NTK5785 \t\t 100%\n",
" PJD2685 \t\t\t PJD2685 \t\t 100%\n",
" NZW2197 \t\t\t NZW2197 \t\t 100%\n",
" PJB7392 \t\t\t PJB7392 \t\t 100%\n",
" NYY1710 \t\t\t NYY1710 \t\t 100%\n",
" OCX4764 \t\t\t OCX4764 \t\t 100%\n"
]
}
],
"source": [
"print(\"Actual License Plate\", \"\\t\", \"Predicted License Plate\", \"\\t\", \"Accuracy\")\n",
"print(\"--------------------\", \"\\t\", \"-----------------------\", \"\\t\", \"--------\")\n",
"\n",
"def calculate_predicted_accuracy(actual_list, predicted_list):\n",
" for actual_plate, predict_plate in zip(actual_list, predicted_list):\n",
" accuracy = \"0%\"\n",
" num_matches = 0\n",
" if actual_plate == predict_plate:\n",
" accuracy = \"100%\"\n",
" else:\n",
" if len(actual_plate) == len(predict_plate):\n",
" for a, p in zip(actual_plate, predict_plate):\n",
" if a == p:\n",
" num_matches += 1\n",
" accuracy = str(round((num_matches/len(actual_plate)), 2) * 100)\n",
" accuracy += \"%\"\n",
" print(\" \", actual_plate, \"\\t\\t\\t\", predict_plate, \"\\t\\t \", accuracy)\n",
"\n",
" \n",
"calculate_predicted_accuracy(list_license_plates, predicted_license_plates)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h3>Woah! We see that the Tesseract OCR engine mostly predicts all of the license plates correctly with 100% accuracy! </h3>\n",
"\n",
"\n",
"<h3>For the license plates the Tesseract OCR Engine predicted incorrectly (i.e. GWT2180, OKV8004, JSQ1413), we will apply image processing techniques on those license plate files and pass them to the Tesseract OCR again. Applying the image processing techniques would increase the accuracy of the Tesseract Engine for the license plates of GWT2180, OKV8004, JSQ1413</h3>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"ref2\"></a>\n",
"<h2 align=\"center\"> Image Processing Techniques </h2>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Image resizing is a technique in which we scale the image either horizontally, or vertically or both. \n",
"Further documentation on image resizing: [here](https://www.tutorialkart.com/opencv/python/opencv-python-resize-image/)\n",
"\n",
"For our use case, we will read the license plate file of ```GWT2180``` using ```cv2.imread``` and then resize the image file by a factor of 2x in both the horizontal and vertical directions using ```cv2.resize```"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0.5, 1.0, 'GWT2180 license plate')"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Read the license plate file and display it\n",
"test_license_plate = cv2.imread(os.getcwd() + \"/license-plates/GWT2180.jpg\")\n",
"plt.imshow(test_license_plate)\n",
"plt.axis('off')\n",
"plt.title('GWT2180 license plate')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1. Image resizing"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"resize_test_license_plate = cv2.resize(test_license_plate, None, fx=2, fy=2, interpolation=cv2.INTER_CUBIC)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 2. Converting to Grayscale"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we convert our resized image file to gray scale. We learnt about this technique in the image processing lab"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"grayscale_resize_test_license_plate = cv2.cvtColor(resize_test_license_plate, cv2.COLOR_BGR2GRAY)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 3. Denoising the Image"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Gaussian Blur is a technique for denoising images. Full OpenCV documentation on Gaussian Blur: [here](https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_filtering/py_filtering.html#gaussian-filtering)\n",
"\n",
"We apply a gaussian blur to the greyscale image"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"gaussian_blur_license_plate = cv2.GaussianBlur(grayscale_resize_test_license_plate, (5, 5), 0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Pass the transformed license plate file to the Tesseract OCR engine and see the predicted result\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"GWT2180\n"
]
}
],
"source": [
"new_predicted_result_GWT2180 = pytesseract.image_to_string(gaussian_blur_license_plate, lang='eng',\n",
"config='--oem 3 -l eng --psm 6 -c tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')\n",
"filter_new_predicted_result_GWT2180 = \"\".join(new_predicted_result_GWT2180.split()).replace(\":\", \"\").replace(\"-\", \"\")\n",
"print(filter_new_predicted_result_GWT2180)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" #### Voila We see that the Tesseract OCR correctly recognises all characters in the GWT2180 license plate after we passed in the transformed license plate"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"ref4\"></a>\n",
"<h2 align=\"center\"> Exercises </h2>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 1:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0.5, 1.0, 'JSQ1413 license plate')"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# 1.1 Read in the license plate file of JSQ1413\n",
"# Write your code below:\n",
"\n",
"test_license_plate_exc = cv2.imread(os.getcwd() + \"/license-plates/JSQ1413.jpg\")\n",
"plt.imshow(test_license_plate_exc)\n",
"plt.axis('off')\n",
"plt.title('JSQ1413 license plate')"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"# 1.2 Apply the image processing techniques to the license plate of JSQ1413 described above \n",
"# Write your code below:\n",
"resize_test_license_plate_exc = cv2.resize(test_license_plate_exc, None, fx=2, fy=2, interpolation=cv2.INTER_CUBIC)\n",
"\n",
"grayscale_resize_test_license_plate_exc = cv2.cvtColor(resize_test_license_plate_exc, cv2.COLOR_BGR2GRAY)\n",
"\n",
"gaussian_blur_license_plate_exc = cv2.GaussianBlur(grayscale_resize_test_license_plate_exc, (5, 5), 0)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"JSQ1413\n"
]
}
],
"source": [
"# 1.3 Pass the modified license plate file to the Tesseract Engine. Report your findings \n",
"# Write your code below:\n",
"\n",
"new_predicted_result_JSQ1413 = pytesseract.image_to_string(gaussian_blur_license_plate_exc, lang='eng',\n",
"config='--oem 3 -l eng --psm 6 -c tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')\n",
"filter_new_predicted_result_JSQ1413 = \"\".join(new_predicted_result_JSQ1413.split()).replace(\":\", \"\").replace(\"-\", \"\")\n",
"print(filter_new_predicted_result_JSQ1413)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Double-click <font color=\"red\"><b><u>here</b></u></font> for the solution.\n",
"\n",
"<!---\n",
"\n",
"# 1.1 Read the license plate file and display it\n",
"\n",
"test_license_plate_JSQ1413 = cv2.imread(os.getcwd() + \"/license-plates/JSQ1413.jpg\")\n",
"plt.imshow(test_license_plate_OKV8004)\n",
"plt.axis('off')\n",
"plt.title('OKV8004 license plate')\n",
"\n",
"# 1.2 Apply the image processing techniques to the license plate of JSQ1413 described above\n",
"\n",
"resize_test_license_plate_JSQ1413 = cv2.resize(test_license_plate_JSQ1413, None, fx=2, fy=2, interpolation=cv2.INTER_CUBIC)\n",
"grayscale_resize_test_license_plate_JSQ1413 = cv2.cvtColor(resize_test_license_plate_JSQ1413, cv2.COLOR_BGR2GRAY)\n",
"gaussian_blur_license_plate_JSQ1413 = cv2.GaussianBlur(grayscale_resize_test_license_plate_JSQ1413, (5, 5), 0)\n",
"\n",
"# 1.3 Pass the modified license plate file to the Tesseract Engine. Report your findings\n",
"\n",
"new_predicted_result_JSQ1413 = pytesseract.image_to_string(gaussian_blur_license_plate_JSQ1413, lang='eng',\n",
"config='--oem 3 -l eng --psm 6 -c tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')\n",
"filter_new_predicted_result_JSQ1413 = \"\".join(new_predicted_result_JSQ1413.split()).replace(\":\", \"\").replace(\"-\", \"\")\n",
"print(filter_new_predicted_result_JSQ1413)\n",
"--->"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h1>Thank you for completing this lab!</h1>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-info\" style=\"margin-top: 20px\">\n",
"<h2>Get IBM Watson Studio free of charge!</h2>\n",
" <p><a href=\"https://cocl.us/NotebooksPython101bottom\"><img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/CV0101/Logo/BottomAd.png\" width=\"750\" align=\"center\"></a></p>\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h3>About the Author:</h3>\n",
"This lab was written by <a href=\"https://www.linkedin.com/in/sacchitchadha/\" target=\"_blank\" >Sacchit Chadha</a> and revised by Nayef Abou Tayoun\n",
"<p><a href=\"https://www.linkedin.com/in/sacchitchadha/\" target=\"_blank\">Sacchit Chadha</a> is a Software Engineer at IBM, and is currently pursuing a Bachelors Degree in Computer Science from the University of Waterloo. His work at IBM focused on Computer Vision, Cloud Computing and Blockchain.</p>\n",
"<p>Nayef Abou Tayoun is a Cognitive Data Scientist at IBM, and is currently pursuing a Master's degree in Artificial Intelligence.</p>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<hr>\n",
"<p>Copyright &copy; 2019 IBM Developer Skills Network. This notebook and its source code are released under the terms of the <a href=\"https://cognitiveclass.ai/mit-license/\">MIT License</a>.</p>"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python",
"language": "python",
"name": "conda-env-python-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment