Skip to content

Instantly share code, notes, and snippets.

@peaeater
Created November 29, 2013 17:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save peaeater/7709529 to your computer and use it in GitHub Desktop.
Save peaeater/7709529 to your computer and use it in GitHub Desktop.
Produce a plain text file per page from DJVU file. Output name includes total page count of input file, and page number of current page. Requires djvutxt => http://djvu.sourceforge.net/doc/man/djvutxt.html
# extract plain text per page from djvu
# requires djvulibre
param(
[Parameter(Mandatory=$true,ValueFromPipeline=$true,Position=0)]
[string]$in
)
process
{
$basePath = split-path $script:myinvocation.mycommand.path
$file = new-object System.IO.FileInfo([System.IO.Path]::Combine($basePath, $in))
$input = ('"{0}"' -f $file.FullName)
$pagecount = & djvused -e 'n' $input
for ($i = 1; $i -le $pagecount; $i++) {
$output = ('"{0}\{1}_t{2}_{3}.txt"' -f $file.DirectoryName, $file.BaseName, $pagecount, $i)
$args = "-page=$i $input $output"
write-host $args
start-process djvutxt $args -NoNewWindow -wait
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment