Skip to content

Instantly share code, notes, and snippets.

@peaeater
Created November 11, 2014 00:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save peaeater/c2bb6e80ff483b7e735c to your computer and use it in GitHub Desktop.
Save peaeater/c2bb6e80ff483b7e735c to your computer and use it in GitHub Desktop.
OCRs image file to text with coordinate info in hocr format with tesseract.
# ocr tif/png to hocr (html)
# requires tesseract
Param(
[string]$ext = "tif",
[string]$indir = ".",
[string]$outdir = $indir
)
if (!(test-path $outdir)) {
mkdir $outdir
}
$files = ls "$indir\*.*" -include *.$ext
foreach ($file in $files) {
$o = "$outdir\{0}" -f $file.BaseName
$args = "`"$file`" `"$o`" hocr"
start-process tesseract $args -wait -NoNewWindow
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment