Related blog post: Count Words and Occurrences of Each Word in a Document using Java
Last active
May 16, 2022 08:15
-
-
Save GroupDocsGists/e7bc4dd7923f1b0586b4343d07ddf89f to your computer and use it in GitHub Desktop.
Count words, unique words, and their occurrence count in Java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Count Words in PDF document using Java | |
// Count Unique Words and their occurrences in PDF document using Java | |
try (Parser parser = new Parser("path/document.pdf")) { | |
TextReader reader = parser.getText(); | |
String text = reader.readToEnd(); | |
String[] words = text.split("\\s+|\\.|\\,|\\?|\\:|\\;"); | |
System.out.println("Length:" + words.length); | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Count Unique Words and their occurrences in PDF document using Java | |
try (Parser parser = new Parser("path/document.pdf")) { | |
TextReader reader = parser.getText(); | |
String text = reader.readToEnd(); | |
String[] words = text.split("\\s+|\\.|\\,|\\?|\\:|\\;"); | |
Hashtable<String, Integer> wordCountTable = new Hashtable<String, Integer>(); | |
int minWordLength = 2; | |
for (String word : words) | |
{ | |
String uniqueWord = word.toLowerCase(); | |
if (uniqueWord.length() > minWordLength) | |
{ | |
if (wordCountTable.containsKey(uniqueWord)) { | |
wordCountTable.replace(uniqueWord, wordCountTable.get(uniqueWord), | |
wordCountTable.get(uniqueWord).intValue() + 1); | |
} else { | |
wordCountTable.put(uniqueWord, 1); | |
} | |
} | |
} | |
wordCountTable.entrySet().forEach(entry -> { | |
System.out.println(entry.getKey() + ": " + entry.getValue()); | |
}); | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment