Skip to content

Instantly share code, notes, and snippets.

@duhaime
Last active May 27, 2022 02:07
Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save duhaime/10dbdbb84f0f15652cc616c45b98027f to your computer and use it in GitHub Desktop.
Save duhaime/10dbdbb84f0f15652cc616c45b98027f to your computer and use it in GitHub Desktop.
Batch process files with ABBYY FineReader using AppleScript
-- specify input and output directories
set infile_directory to "/Users/doug/Desktop/inputs/"
set outfile_directory to "/Users/doug/Desktop/outputs/"
-- get the basenames of each input file
tell application "System Events"
set infile_list to files of folder infile_directory
end tell
-- process each input file
repeat with infile in infile_list
set infile_name to name of infile
set infile to POSIX file (infile_directory & infile_name)
set outfile to POSIX file (outfile_directory & infile_name)
run_ocr(infile, outfile)
end repeat
-- main function: run ocr on an infile and save results to an outfile
on run_ocr(infile, outfile)
-- identify path to ABBYY FineReader
set appFile to POSIX file "/Applications/FineReader.app"
-- set FineReader parameters
using terms from application "FineReader"
set langList to {English, Latin}
set saveType to single file
end using terms from
using terms from application "FineReader"
set toFile to outfile
set retainLayoutWordLayout to as editable copy
set keepPageNumberHeadersAndFootersBoolean to yes
set keepLineBreaksAndHyphenationBoolean to yes
set keepPageBreaksBoolean to yes
set increasePaperSizeToFitContentBoolean to yes
set keepImageBoolean to yes
set imageOptionsImageQualityEnum to high quality
set keepTextAndBackgroundColorsBoolean to yes
set highlightUncertainSymbolsBoolean to yes
set keepPageNumbersBoolean to yes
end using terms from
WaitWhileBusy()
tell application "FineReader"
set hasdoc to has document
if hasdoc then
close document
end if
end tell
WaitWhileBusy()
tell application "FineReader"
set auto_read to auto read new pages false
end tell
tell application "Finder"
open infile using appFile
end tell
delay 5
WaitWhileBusy()
-- the end of line character below is created by pressing OPTION+ENTER
tell application "FineReader"
export to html toFile ¬
ocr languages enum langList ¬
saving type saveType ¬
keep line breaks and hyphenation keepLineBreaksAndHyphenationBoolean ¬
keep page numbers headers and footers keepPageNumberHeadersAndFootersBoolean ¬
keep pictures keepImageBoolean ¬
image quality imageOptionsImageQualityEnum ¬
keep text and background colors keepTextAndBackgroundColorsBoolean
end tell
WaitWhileBusy()
-- close the current file
tell application "FineReader"
auto read new pages auto_read
close document
end tell
end run_ocr
-- close ABBYY
tell application "FineReader"
quit
end tell
-- helpers to wait for thread to open up
on WaitWhileBusy()
repeat while IsMainApplicationBusy()
end repeat
end WaitWhileBusy
on IsMainApplicationBusy()
tell application "FineReader"
set resultBoolean to is busy
end tell
return resultBoolean
end IsMainApplicationBusy
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# pip install pyobjc
# pip install py-applescript
import applescript, os, glob, sys
scpt = applescript.AppleScript('''
-- main function: run ocr on an infile and save results to an outfile
on run_ocr(infile, outfile)
set infile to POSIX file infile
set outfile to POSIX file outfile
-- identify path to ABBYY FineReader
set appFile to POSIX file "/Applications/FineReader.app"
-- set FineReader parameters
using terms from application "FineReader"
set langList to {English, Latin}
set saveType to single file
end using terms from
using terms from application "FineReader"
set toFile to outfile
set retainLayoutWordLayout to as editable copy
set keepPageNumberHeadersAndFootersBoolean to yes
set keepLineBreaksAndHyphenationBoolean to yes
set keepPageBreaksBoolean to yes
set increasePaperSizeToFitContentBoolean to yes
set keepImageBoolean to yes
set imageOptionsImageQualityEnum to high quality
set keepTextAndBackgroundColorsBoolean to yes
set highlightUncertainSymbolsBoolean to yes
set keepPageNumbersBoolean to yes
end using terms from
WaitWhileBusy()
tell application "FineReader"
set hasdoc to has document
if hasdoc then
close document
end if
end tell
WaitWhileBusy()
tell application "FineReader"
set auto_read to auto read new pages false
end tell
tell application "Finder"
open infile using appFile
end tell
delay 5
WaitWhileBusy()
-- the end of line character below is created by pressing OPTION+ENTER
tell application "FineReader"
export to html toFile ¬
ocr languages enum langList ¬
saving type saveType ¬
keep line breaks and hyphenation keepLineBreaksAndHyphenationBoolean ¬
keep page numbers headers and footers keepPageNumberHeadersAndFootersBoolean ¬
keep pictures keepImageBoolean ¬
image quality imageOptionsImageQualityEnum ¬
keep text and background colors keepTextAndBackgroundColorsBoolean
end tell
WaitWhileBusy()
-- close the current file
tell application "FineReader"
auto read new pages auto_read
close document
end tell
end run_ocr
-- close Abbyy
tell application "FineReader"
quit
end tell
-- helpers to wait for thread to open up
on WaitWhileBusy()
repeat while IsMainApplicationBusy()
end repeat
end WaitWhileBusy
on IsMainApplicationBusy()
tell application "FineReader"
set resultBoolean to is busy
end tell
return resultBoolean
end IsMainApplicationBusy
''')
infiles = glob.glob('inputs/*')
for infile in infiles:
infile = os.path.abspath(infile)
outfile = os.path.abspath('outputs/' + os.path.basename(infile))
print(' * processing', infile)
scpt.call('run_ocr', infile, outfile)
@chriscjcj
Copy link

chriscjcj commented Mar 26, 2021

I have attempted to use this script and line 60 generates an error:

error "Finder got an error: Handler can’t handle objects of this class." number -10010

I thought it had to do with setting the volume paths to network shares, but I seem to get the error regardless of the path.

@duhaime
Copy link
Author

duhaime commented Mar 26, 2021

@chriscjcj interesting! Could it be that the location of your Abbyy app is somewhere else? Does /Applications/FineReader.app exist on your machine?

I would also add, if you're not opposed to it, you may find tesseract a little easier to work with. Using the automator approach above is kind of fun if you must use ABBYY, but otherwise I'd likely go with something that's intended to be automated on OSX like tesseract...

@chriscjcj
Copy link

chriscjcj commented Mar 27, 2021

@duhaime First of all, thank you so much for the reply.

Here's what I'm trying to do... I have a Synology NAS and a Brother ADS-1700W document scanner. This scanner can scan directly to a network share on the NAS. Of course, it doesn't do OCR. I was hoping to implement an automated (or at least very easy) process by which I could point an OCR program at a folder of scanned documents, and have it dump the OCR'ed documents into another directory.

I do have ABBYY FireReader and it is located at the path you already denoted in the AppleScript. I did read that AppleScript can be finicky about how paths are expressed. (posix paths, etc.) So I tried many different ways to denote the path, but none was successful.

Thank you very much for letting me know about tesseract-ocr. I was unaware this existed. I will dive into it this weekend and see if it offers the functionality to get me where I'm trying to go.

I'm grateful for your advice and consultation. Thanks again! :-)

EDIT: I just noticed that tesseract-ocr can run as a Docker container. Synology supports running docker containers. Perhaps I could run this directly on the NAS. That would be fantastic. Getting there will be interesting. While I'm fairly nerdy, I'm not a coder by trade. We'll see if I have what it takes to pull it off. ;-)

EDIT2: I found this tutorial. It's six years old, but might get me there. I'll report back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment