Skip to content

Instantly share code, notes, and snippets.

@bishabosha
Created March 9, 2024 15:43
Show Gist options
  • Save bishabosha/de064673e486e339c09694f51f5ffa1f to your computer and use it in GitHub Desktop.
Save bishabosha/de064673e486e339c09694f51f5ffa1f to your computer and use it in GitHub Desktop.
scala script to convert PDF to plain text
//> using repository "https://repo.e-iceblue.com/nexus/content/groups/public/"
//> using dep "e-iceblue:spire.pdf:9.10.3"
//> using dep "jakarta.xml.bind:jakarta.xml.bind-api:4.0.0"
//> using scala 3.4.0
//> using toolkit default
import com.spire.pdf.PdfDocument
import scala.util.Using, Using.Releasable
import scala.util.chaining.given
val path = os.Path(args(0))
given Releasable[PdfDocument] = _.close()
Using
.resource(PdfDocument(path.toString)): pdf =>
val limit = pdf.getPages().nn.getCount()
Iterator
.iterate(0)(_ + 1)
.takeWhile(_ < limit)
.map(pdf.getPages().nn.get(_).nn)
.map(_.extractText().nn)
.mkString("\n\n")
.pipe(println)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment