Skip to content

Instantly share code, notes, and snippets.

@t27
Last active March 9, 2021 21:42
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save t27/0c8eff50f0b723507de6c1646339fcaf to your computer and use it in GitHub Desktop.
Save t27/0c8eff50f0b723507de6c1646339fcaf to your computer and use it in GitHub Desktop.
Converting OpenCV Mat to Tesseract Compatible formats

I was primarily using the JavaCPP presets for OpenCV(JavaCV) and Tesseract

But I also found some references for C++

Here's what works for C++

#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>    
#include <opencv2/opencv.hpp>
...
char imagename[] = "testimg.jpg";
cv::Mat _mat = cv::imread(imagename);
cv::cvtColor(_mat, _mat, CV_BGR2RGBA); 
api.SetImage(_mat.data, _mat.cols, _mat.rows, 4, 4*_mat.cols);
char *outtext = api.GetUTF8Text();
...

Source: http://stackoverflow.com/a/38583706/1541263

For JavaCV, after hours of searching the web and debugging, this is what works

Mat img = imread("file.jpg");
Mat gray = new Mat();
cvtColor(img, gray, CV_BGR2GRAY);
// api is a Tesseract client which is initialised

api.SetImage(gray.data().asBuffer(),gray.size().width(),gray.size().height(),gray.channels(),gray.size1())

@eebart
Copy link

eebart commented Dec 18, 2017

This is exactly what I also spent hours looking for. Thank you!

@sdhegde
Copy link

sdhegde commented Sep 22, 2018

according to the API docs void tesseract::TessBaseAPI::SetImage(const Pix * pix ), it is suggested not to pass raw data to SetImage()

Pix vs raw, which to use? Use Pix where possible. A future version of Tesseract may choose to use Pix as its internal representation and discard IMAGE altogether. Because of that, an implementation that sources and targets Pix may end up with less copies than an implementation that does not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment