Skip to content

Instantly share code, notes, and snippets.

@TheJagStudio
Created October 2, 2023 04:37
Show Gist options
  • Save TheJagStudio/d675d42bab61f7d1d27357166fbad545 to your computer and use it in GitHub Desktop.
Save TheJagStudio/d675d42bab61f7d1d27357166fbad545 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@Alexgarciaarb
Copy link

Alexgarciaarb commented Jan 13, 2024

Hi Jagrat, I have been trying to execute your code and the outcome has not been successful. I'll share with you the error I got to make. I am using Colab. I installed this one !pip install "pytesseract" and "from pytesseract import image_to_string".

the error comes after executing this line code:

text_with_pytesseract = extract_text_with_pytesseract(convert_pdf_to_images)
print(text_with_pytesseract)

"tesseract is not installed or it's not in your PATH. See README file for more information".

Thank you for your help.

@TheJagStudio
Copy link
Author

Hello Alexgarciaarb,

It seems like you're encountering an issue because Tesseract OCR is not installed or not in your system's PATH. To resolve this, you can follow these steps to install Tesseract OCR on Google Colab:

  1. Install Tesseract OCR in Colab:
!apt install tesseract-ocr
!apt install libtesseract-dev
!pip install pytesseract
  1. Import the pytesseract module in your Colab notebook:

from pytesseract import image_to_string

  1. Verify Tesseract installation:
    After running the above commands, you can check if Tesseract is correctly installed and accessible by running:
import pytesseract
print(pytesseract.get_tesseract_version())

Make sure there are no errors, and the Tesseract version is displayed.

  1. Update the Tesseract path:
    If the problem persists, you might need to explicitly specify the Tesseract executable path in your Colab notebook. You can do this by adding the following line:

pytesseract.pytesseract.tesseract_cmd = '/usr/bin/tesseract'

Now, try running your code again after these steps. It should resolve the issue, and you should be able to use Tesseract OCR in your Colab environment.

Let me know if you encounter any further issues or if you need additional assistance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment