Skip to content

Instantly share code, notes, and snippets.

@ivan
Last active February 20, 2023 14:09
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ivan/2caccf9fb1cdb8f3daea to your computer and use it in GitHub Desktop.
Save ivan/2caccf9fb1cdb8f3daea to your computer and use it in GitHub Desktop.
grab-site improvement plan
===
gs-server --upstream= to forward crawl activity to another gs-server
Implement a login form on the dashboard that writes a user:password to a cookie
dashboard ignore context menu: make dashboard send control messages to the server
server should reply with success/fail
fail should show highly visible error at the top of the dashboard
gs-server: handle ignore messages from dashboard: if auth OK, append ignore to the crawl's `ignores` file
gs-server needs to know where to write to, so grab-site must report its working dir to gs-server. Therefore gs-server must run on the same machine as the grab-site.
That's okay, we have federation via --upstream=.
gs-server: implement other control messages that ArchiveBot users expect to work: !con, !delay, etc
All of these directly manipulate the grab-site control files as well.
gs-server federation: keep track of which server is responsible for which crawl and forward control messages to the right gs-server
gs-server: add support for starting a new crawl
If auth OK, cd to right directory and run: tmux new -s grab-site-CRAWL-NAME -d "grab-site ..."
right directory: add gs-server --crawls-dir= argument
dashboard: add UI for starting a new crawl
---
How to start a new crawl with many gs-servers?
- Make gs-server report its hostname to the --upstream= server
- If user wants crawl on specific server, start new crawl with `server: "hostname"` and gs-server will forward it to that machine
- If no server specified, run some ugly function that determines which server is least-busy and has enough disk and memory
- Totally dynamic based on resources available, no "slots" unless hard limits wanted
---
How to authenticate users?
TBD
@Asparagirl
Copy link

dashboard: add UI for starting a new crawl

THIS. This is what is going to help new people make and contribute WARCS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment