Skip to content

Instantly share code, notes, and snippets.

View harigopalakrishna's full-sized avatar

Hari Gopalakrishna harigopalakrishna

View GitHub Profile
@harigopalakrishna
harigopalakrishna / PdfText
Created September 26, 2014 16:28
Extracting text from PDF file using iText and qualifying the document for OCR processing
/*
Extracts text from PDF using iText libraries.
If no text is found, it could be a document with images or may be a scanned pdf
NOTE: This logic works for SINGLE PAGE PDF
*/
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
public class PdfText {