Skip to content

Instantly share code, notes, and snippets.

@krmahadevan
Created August 25, 2011 07:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save krmahadevan/1170150 to your computer and use it in GitHub Desktop.
Save krmahadevan/1170150 to your computer and use it in GitHub Desktop.
This sample program helps you work with a pdf file over the net and extract its contents
import java.io.BufferedInputStream;
import java.io.IOException;
import java.net.URISyntaxException;
import java.net.URL;
import org.apache.pdfbox.pdfparser.PDFParser;
import org.apache.pdfbox.util.PDFTextStripper;
public class PlayWithPDF {
/**
* @param args
* @throws URISyntaxException
* @throws IOException
*/
public static void main(String[] args) throws URISyntaxException, IOException {
URL url = new URL("http://illiad.evms.edu/sample.pdf");
System.out.println(getTextFromPDF(url));
}
public static String getTextFromPDF(URL url) throws IOException{
BufferedInputStream fileToParse = new BufferedInputStream(url.openStream());
PDFParser parser = new PDFParser(fileToParse);
parser.parse();
String text = new PDFTextStripper().getText(parser.getPDDocument());
System.out.println(text);
parser.getPDDocument().close();
return text;
}
}
@krmahadevan
Copy link
Author

Make sure that the following is added up as a Maven dependency for this to work.

    <dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>pdfbox</artifactId>
        <version>1.1.0</version>
    </dependency>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment