Skip to content

Instantly share code, notes, and snippets.

@alexandrevicenzi
Created February 22, 2014 05:46
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save alexandrevicenzi/9149274 to your computer and use it in GitHub Desktop.
Save alexandrevicenzi/9149274 to your computer and use it in GitHub Desktop.
Plain text PDF with pdf.js
<!doctype html>
<html>
<head>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
<script src="http://mozilla.github.io/pdf.js/build/pdf.js"></script>
<script>
function pdfToPlainText(pdfData) {
PDFJS.disableWorker = true;
var pdf = PDFJS.getDocument(pdfData);
pdf.then(getPages);
}
function getPages(pdf) {
for (var i = 0; i < pdf.numPages; i++) {
pdf.getPage(i + 1).then(getPageText);
}
}
function getPageText(page) {
page.getTextContent().then(function(textContent) {
textContent.forEach(function(o) {
$("#pdf").append(o.str + '</br>');
});
});
}
</script>
</head>
<body onload="pdfToPlainText('TestDocument.pdf')">
<h1>Plain Text PDF with pdf.js</h1>
</br>
<div id="pdf"></div>
</body>
</html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment