Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
grab-site improvement plan
===
gs-server --upstream= to forward crawl activity to another gs-server
Implement a login form on the dashboard that writes a user:password to a cookie
dashboard ignore context menu: make dashboard send control messages to the server
server should reply with success/fail
fail should show highly visible error at the top of the dashboard
gs-server: handle ignore messages from dashboard: if auth OK, append ignore to the crawl's `ignores` file
gs-server needs to know where to write to, so grab-site must report its working dir to gs-server. Therefore gs-server must run on the same machine as the grab-site.
That's okay, we have federation via --upstream=.
gs-server: implement other control messages that ArchiveBot users expect to work: !con, !delay, etc
All of these directly manipulate the grab-site control files as well.
gs-server federation: keep track of which server is responsible for which crawl and forward control messages to the right gs-server
gs-server: add support for starting a new crawl
If auth OK, cd to right directory and run: tmux new -s grab-site-CRAWL-NAME -d "grab-site ..."
right directory: add gs-server --crawls-dir= argument
dashboard: add UI for starting a new crawl
---
How to start a new crawl with many gs-servers?
- Make gs-server report its hostname to the --upstream= server
- If user wants crawl on specific server, start new crawl with `server: "hostname"` and gs-server will forward it to that machine
- If no server specified, run some ugly function that determines which server is least-busy and has enough disk and memory
- Totally dynamic based on resources available, no "slots" unless hard limits wanted
---
How to authenticate users?
TBD
@Asparagirl

This comment has been minimized.

Copy link

commented Dec 18, 2015

dashboard: add UI for starting a new crawl

THIS. This is what is going to help new people make and contribute WARCS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.