Skip to content

Instantly share code, notes, and snippets.

@GroupDocsGists
Created November 24, 2023 19:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save GroupDocsGists/fcc9f3479db3fce3158b4d91aa7a5231 to your computer and use it in GitHub Desktop.
Save GroupDocsGists/fcc9f3479db3fce3158b4d91aa7a5231 to your computer and use it in GitHub Desktop.
Regex Search in Files across Folders with Java
// Regex Search multiple files across folders using Java
// Creating an index folder and add document's folder to it
Index index = new Index("path/indexing-folder-path");
index.add("path/parent-folder");
// Prepare the Regex Query and Search
// Regex here is to identify all words having any consecutive repeated characters.
String query = "^(.)\\1{1,}";
SearchResult result = index.search(query);
// Highlighting and Printing Regex Search Results for all the documents
for (int i = 0 ; i < result.getDocumentCount(); i++)
{
FoundDocument document = result.getFoundDocument(i);
OutputAdapter outputAdapter = new FileOutputAdapter(OutputFormat.Html, "path/Highlight" + i + ".html");
Highlighter highlighter = new DocumentHighlighter(outputAdapter);
index.highlight(document, highlighter);
System.out.println("\tDocument: " + document.getDocumentInfo().getFilePath());
System.out.println("\tOccurrences: " + document.getOccurrenceCount());
for (FoundDocumentField field : document.getFoundFields()) {
System.out.println("\t\tField: " + field.getFieldName());
System.out.println("\t\tOccurrences: " + field.getOccurrenceCount());
// Printing found terms
if (field.getTerms() != null) {
for (int k = 0; k < field.getTerms().length; k++) {
System.out.println("\t\t\t" + field.getTerms()[k] + " - " + field.getTermsOccurrences()[k]);
}
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment