Priority | Best Option |
---|---|
🪶 Ease of setup & low cost | OpenAI API |
🔐 Data privacy & control | Local LLM (gpt-oss / Llama 3.2) |
⚡ Speed on single machine | Local LLM (if GPU available) |
🌍 Scalability / multi-tenant processing | OpenAI API |
File | Purpose |
---|---|
image_data_extractor.py |
Handles image-based invoice OCR |
pdf_data_extractor.py |
Extracts text and metadata from PDFs |
prompt_creator.py |
Generates schema-based AI prompts |
openai_api.py |
Integrates OpenAI models (e.g. GPT-4.1-mini) |
local_llms_api.py |
Connects to local models like Llama3.2 |
util.py |
Shared utilities and helpers |
run.sh |
Batch extraction script |