Skip to content

Instantly share code, notes, and snippets.

@davidmezzetti
Created July 16, 2024 17:11
Show Gist options
  • Save davidmezzetti/ac7c7222679d76807081fac41be8badf to your computer and use it in GitHub Desktop.
Save davidmezzetti/ac7c7222679d76807081fac41be8badf to your computer and use it in GitHub Desktop.
from txtai.pipeline import Textractor
textractor = Textractor(sections=True)
# Install [pipeline-data] extra to support extracting text from docx/pdf/xlsx
for section in textractor("https://github.com/neuml/txtai"):
print(f"\n[SECTION]\n{section}")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment