Skip to content

Instantly share code, notes, and snippets.

@aspose-com-gists
Last active December 22, 2021 20:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aspose-com-gists/cf2075ea56b18525c0108aeb290d6e8f to your computer and use it in GitHub Desktop.
Save aspose-com-gists/cf2075ea56b18525c0108aeb290d6e8f to your computer and use it in GitHub Desktop.
Convert Scanned PDF to Text in C# .NET
// Initialize AsposeOcr class instance
AsposeOcr api = new AsposeOcr();
// Recognize images from PDF
DocumentRecognitionSettings set = new DocumentRecognitionSettings();
set.DetectAreas = false;
// Save recognition results
List<RecognitionResult> result = api.RecognizePdf("multi_page_1.pdf", set);
// Initialize StringBuilder class object
StringBuilder builder = new StringBuilder();
// Save result in a TXT file
foreach (RecognitionResult page in result)
{
builder.Append(page.RecognitionText);
}
System.IO.File.WriteAllText("Text.txt", builder.ToString());
// Initialize AsposeOcr class instance
AsposeOcr api = new AsposeOcr();
// Specify the setting for recognizing the scanned PDF file
DocumentRecognitionSettings set = new DocumentRecognitionSettings();
set.DetectAreas = false;
// Initialize RecognitionResult class object
List<RecognitionResult> result = api.RecognizePdf("pages.pdf", set);
// Print text after recognizing it from scanned PDF
int pageNumber = 0;
foreach (RecognitionResult page in result)
{
System.Console.WriteLine($"Page: {pageNumber++} text: {page.RecognitionText}");
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment