Skip to content

Instantly share code, notes, and snippets.

@moul
Created March 10, 2019 13:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save moul/47d9a66308a015a501a062c5cb769d70 to your computer and use it in GitHub Desktop.
Save moul/47d9a66308a015a501a062c5cb769d70 to your computer and use it in GitHub Desktop.
version: '3'
services:
archivebox:
build: ./ArchiveBox
stdin_open: true
tty: true
environment:
- USE_COLOR=False
- SHOW_PROGRESS=False
volumes:
- /data/webarchives:/data
command: 'tail -f /dev/null'
nginx:
image: 'nginx'
restart: unless-stopped
environment:
- VIRTUAL_HOST=vhost.yourdomain.com
- VIRTUAL_PORT=80
networks:
- default
- service-proxy
volumes:
- ./ArchiveBox/etc/nginx/nginx.conf:/etc/nginx/nginx.conf
- /data/webarchives:/var/www
networks:
service-proxy:
external: true
.PHONY: once
once: ArchiveBox
# create data dir
mkdir -p /data/webarchives; chmod 777 /data/webarchives
# start ArchiveBox
docker-compose up -d
# Pass a list of sitemaps to the archiver
for sitemap in `curl -s 'https://docs.google.com/spreadsheets/d/e/<YOUR_DOCUMENT_ID>/pub?output=csv&gid=<YOUR_SITEMAP_TAB_ID>'`; do \
docker-compose exec -T archivebox /bin/archive "$$sitemap";\
done
# Pass a list of single links to the archiver
docker-compose exec -T archivebox /bin/archive 'https://docs.google.com/spreadsheets/d/e/<YOUR_DOCUMENT_ID>/pub?output=csv'
.PHONY: loop
loop:
# run the archiver every day
while true; do make once; sleep 86400; done
ArchiveBox:
# if the ArchiveBox dir is missing, clone it
git clone https://github.com/pirate/ArchiveBox
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment