Skip to content

Instantly share code, notes, and snippets.

@GroupDocsGists
Last active May 16, 2022 08:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save GroupDocsGists/e7bc4dd7923f1b0586b4343d07ddf89f to your computer and use it in GitHub Desktop.
Save GroupDocsGists/e7bc4dd7923f1b0586b4343d07ddf89f to your computer and use it in GitHub Desktop.
Count words, unique words, and their occurrence count in Java
// Count Words in PDF document using Java
// Count Unique Words and their occurrences in PDF document using Java
try (Parser parser = new Parser("path/document.pdf")) {
TextReader reader = parser.getText();
String text = reader.readToEnd();
String[] words = text.split("\\s+|\\.|\\,|\\?|\\:|\\;");
System.out.println("Length:" + words.length);
}
// Count Unique Words and their occurrences in PDF document using Java
try (Parser parser = new Parser("path/document.pdf")) {
TextReader reader = parser.getText();
String text = reader.readToEnd();
String[] words = text.split("\\s+|\\.|\\,|\\?|\\:|\\;");
Hashtable<String, Integer> wordCountTable = new Hashtable<String, Integer>();
int minWordLength = 2;
for (String word : words)
{
String uniqueWord = word.toLowerCase();
if (uniqueWord.length() > minWordLength)
{
if (wordCountTable.containsKey(uniqueWord)) {
wordCountTable.replace(uniqueWord, wordCountTable.get(uniqueWord),
wordCountTable.get(uniqueWord).intValue() + 1);
} else {
wordCountTable.put(uniqueWord, 1);
}
}
}
wordCountTable.entrySet().forEach(entry -> {
System.out.println(entry.getKey() + ": " + entry.getValue());
});
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment