Last active
February 20, 2023 14:09
-
-
Save ivan/2caccf9fb1cdb8f3daea to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
grab-site improvement plan | |
=== | |
gs-server --upstream= to forward crawl activity to another gs-server | |
Implement a login form on the dashboard that writes a user:password to a cookie | |
dashboard ignore context menu: make dashboard send control messages to the server | |
server should reply with success/fail | |
fail should show highly visible error at the top of the dashboard | |
gs-server: handle ignore messages from dashboard: if auth OK, append ignore to the crawl's `ignores` file | |
gs-server needs to know where to write to, so grab-site must report its working dir to gs-server. Therefore gs-server must run on the same machine as the grab-site. | |
That's okay, we have federation via --upstream=. | |
gs-server: implement other control messages that ArchiveBot users expect to work: !con, !delay, etc | |
All of these directly manipulate the grab-site control files as well. | |
gs-server federation: keep track of which server is responsible for which crawl and forward control messages to the right gs-server | |
gs-server: add support for starting a new crawl | |
If auth OK, cd to right directory and run: tmux new -s grab-site-CRAWL-NAME -d "grab-site ..." | |
right directory: add gs-server --crawls-dir= argument | |
dashboard: add UI for starting a new crawl | |
--- | |
How to start a new crawl with many gs-servers? | |
- Make gs-server report its hostname to the --upstream= server | |
- If user wants crawl on specific server, start new crawl with `server: "hostname"` and gs-server will forward it to that machine | |
- If no server specified, run some ugly function that determines which server is least-busy and has enough disk and memory | |
- Totally dynamic based on resources available, no "slots" unless hard limits wanted | |
--- | |
How to authenticate users? | |
TBD |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
dashboard: add UI for starting a new crawl
THIS. This is what is going to help new people make and contribute WARCS.