For more details, please visit Convert Scanned PDF File to Text in C#
Last active
December 22, 2021 20:31
-
-
Save aspose-com-gists/cf2075ea56b18525c0108aeb290d6e8f to your computer and use it in GitHub Desktop.
Convert Scanned PDF to Text in C# .NET
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Initialize AsposeOcr class instance | |
AsposeOcr api = new AsposeOcr(); | |
// Recognize images from PDF | |
DocumentRecognitionSettings set = new DocumentRecognitionSettings(); | |
set.DetectAreas = false; | |
// Save recognition results | |
List<RecognitionResult> result = api.RecognizePdf("multi_page_1.pdf", set); | |
// Initialize StringBuilder class object | |
StringBuilder builder = new StringBuilder(); | |
// Save result in a TXT file | |
foreach (RecognitionResult page in result) | |
{ | |
builder.Append(page.RecognitionText); | |
} | |
System.IO.File.WriteAllText("Text.txt", builder.ToString()); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Initialize AsposeOcr class instance | |
AsposeOcr api = new AsposeOcr(); | |
// Specify the setting for recognizing the scanned PDF file | |
DocumentRecognitionSettings set = new DocumentRecognitionSettings(); | |
set.DetectAreas = false; | |
// Initialize RecognitionResult class object | |
List<RecognitionResult> result = api.RecognizePdf("pages.pdf", set); | |
// Print text after recognizing it from scanned PDF | |
int pageNumber = 0; | |
foreach (RecognitionResult page in result) | |
{ | |
System.Console.WriteLine($"Page: {pageNumber++} text: {page.RecognitionText}"); | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment