- Walk through the disk/directory(ies) that contain the PDFs
- Store the following data in a "document store" of some kind that supports text search for later retrieval
{
"sha256": "<sha256 hash of the file>",
"filename": "<filename>",
"path": "<path>",
"contents": "<pdftotext (or similar) of the first 100 pages>"
}
- a local web server interface with a "search box" to search through this data
- optionally, a way to add notes, and tags to the above document uniquely.
- optionally, flag "duplicate files", so that they are not returned in search results.