Skip to content

Instantly share code, notes, and snippets.

@peaeater
Created November 29, 2013 17:59
Show Gist options
  • Save peaeater/7709591 to your computer and use it in GitHub Desktop.
Save peaeater/7709591 to your computer and use it in GitHub Desktop.
Produce a canvas structure XML (which djvulibre calls 'hidden text') file per page from a DJVU file. Output name includes total page count of input file, and page number of current page. Requires djvutoxml => http://djvu.sourceforge.net/doc/man/djvuxml.html
# extract hidden text xml per page from djvu
# requires djvulibre
param(
[Parameter(Mandatory=$true,ValueFromPipeline=$true,Position=0)]
[string]$in
)
process
{
$basePath = split-path $script:myinvocation.mycommand.path
$file = new-object System.IO.FileInfo([System.IO.Path]::Combine($basePath, $in))
$input = ('"{0}"' -f $file.FullName)
write-host $input
$pagecount = & djvused -e 'n' $input
for ($i = 1; $i -le $pagecount; $i++) {
$output = ('"{0}\{1}_t{2}_{3}.xml"' -f $file.DirectoryName, $file.BaseName, $pagecount, $i)
$args = "--with-text --page $i $input $output"
write-host $args
start-process djvutoxml $args -NoNewWindow -wait
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment