Skip to content

Instantly share code, notes, and snippets.

@groupdocs-cloud-gists
Last active March 17, 2021 05:50
Show Gist options
  • Save groupdocs-cloud-gists/aec775dc3a318dd54fe4314d477d1e83 to your computer and use it in GitHub Desktop.
Save groupdocs-cloud-gists/aec775dc3a318dd54fe4314d477d1e83 to your computer and use it in GitHub Desktop.
Extract text from PDF documents programmatically using a REST API in Python.
Extract Text from PDF Documents
1. Programmatically upload a PDF file on the cloud
2. Extract Text from a PDF document programmatically using Python.
3. Download the Text file from the cloud.
client_id = "112f0f38-9dae-42d5-b4fc-cc84ae644972"
client_secret = "16ad3fe0bdc39c910f57d2fd48a5d618"
configuration = groupdocs_parser_cloud.Configuration(client_id, client_secret)
configuration.api_base_url = "https://api.groupdocs.cloud"
my_storage = ""
# api initialization
parseApi = groupdocs_parser_cloud.ParseApi.from_config(configuration)
# define text options
options = groupdocs_parser_cloud.TextOptions()
options.file_info = groupdocs_parser_cloud.FileInfo()
options.file_info.file_path = "sample.pdf"
request = groupdocs_parser_cloud.TextRequest(options)
result = parseApi.text(request)
print("Text: " + result.text)
# api initialization
parseApi = groupdocs_parser_cloud.ParseApi.from_config(configuration)
# define text options
options = groupdocs_parser_cloud.TextOptions()
options.file_info = groupdocs_parser_cloud.FileInfo()
options.file_info.file_path = "sample.pdf"
options.start_page_number = 1
options.count_pages_to_extract = 2
request = groupdocs_parser_cloud.TextRequest(options)
result = parseApi.text(request)
for page in result.pages:
print("PageIndex: " + str(page.page_index) + ". Text: " + page.text)
# api initialization
parseApi = groupdocs_parser_cloud.ParseApi.from_config(configuration)
# define text options
options = groupdocs_parser_cloud.TextOptions()
options.file_info = groupdocs_parser_cloud.FileInfo()
options.file_info.file_path = "PDF_with_attachements.pdf"
options.file_info.password = "password"
container_info = groupdocs_parser_cloud.ContainerItemInfo()
container_info.relative_path = "template-document.pdf"
options.container_item_info = container_info
options.start_page_number = 2
options.count_pages_to_extract = 1
request = groupdocs_parser_cloud.TextRequest(options)
result = parseApi.text(request)
print("Text: " + result.pages[0].text)
# api initialization
file_api = groupdocs_parser_cloud.FileApi.from_config(configuration)
my_storage = ""
request = groupdocs_parser_cloud.UploadFileRequest("sample.pdf", "C:\\Files\\sample.pdf", my_storage)
response = file_api.upload_file(request)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment