Skip to content

Instantly share code, notes, and snippets.

@nfreear
Created October 17, 2012 14:41
Show Gist options
  • Save nfreear/3905882 to your computer and use it in GitHub Desktop.
Save nfreear/3905882 to your computer and use it in GitHub Desktop.
PDF to HTML conversion in PHP
<?php
// Standalone context.
define('PDFTOHTML_PATH', 'X:/PATH TO/workspace/_ouplayer_data/pdftohtml-0.39/pdftohtml.exe'); # Windows.
#define('PDFTOHTML_PATH', '/usr/bin/pdftohtml'); # Redhat 6.
require_once 'libraries/pdftohtml.php';
//( Or, CodeIgniter context. )
#$config['pdftohtml_path'] = '/usr/bin/pdftohtml';
#$this->load->library('pdftohtml');
$pdf = '/my/data/transcripts/l314audio2.pdf'; #Ok.
$xml = str_replace('.pdf', '.xml', $pdf); //tmp file?
$ofile = str_replace('.pdf', '_transcript.html', $pdf);
try {
$parser = new Pdftohtml();
$out = $parser->parse($pdf, $xml, $delete_xml = TRUE);
$by = file_put_contents($ofile, $out);
echo "Ok, written $by bytes | $ofile".PHP_EOL;
}
catch(Exception $ex) {
die('EX: '.$ex->getMessage().PHP_EOL);
}
/*
pdftohtml binary install (Redhat Linux)
$ yum info poppler
$ yum install poppler
$ pdftohtml -v
*/
@killua99
Copy link

I have some question, where is require_once 'libraries/pdftohtml.php'; ? I'm looking for that file and cant find it. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment