Skip to content

Instantly share code, notes, and snippets.

@btbytes
Created October 16, 2020 17:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save btbytes/70e2a9024cf4c4dfb9699d10d5644b8e to your computer and use it in GitHub Desktop.
Save btbytes/70e2a9024cf4c4dfb9699d10d5644b8e to your computer and use it in GitHub Desktop.
A personal desktop PDF/documents search interface

A personal desktop PDF/documents search interface

  1. Walk through the disk/directory(ies) that contain the PDFs
  2. Store the following data in a "document store" of some kind that supports text search for later retrieval
{
  "sha256": "<sha256 hash of the file>",
  "filename": "<filename>",
  "path": "<path>",
  "contents": "<pdftotext (or similar) of the first 100 pages>"
}
  1. a local web server interface with a "search box" to search through this data
  2. optionally, a way to add notes, and tags to the above document uniquely.
  3. optionally, flag "duplicate files", so that they are not returned in search results.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment