Skip to content

Instantly share code, notes, and snippets.

What would you like to do?



Please change any reference to 'yano' to your own username.

Setting Up VMs

I would spin up a Hetzner cloud and run the following series of commands on a new VM; and then move on to the next one.

Change the IPv6 to match the first four octets that Hetzner gives you.

$ echo -e 'alias myip="curl -L"\nalias myip4="curl -L"\nalias myip6="curl -L"\n' >> ~/.bashrc && source ~/.bashrc

$ for i in {3500..3599}; do /sbin/ip -6 addr add aaaa:bbbb:cccc:dddd::$i/128 dev eth0; done

$ myip

$ apt update; apt -y upgrade; apt -y install apt-transport-https ca-certificates curl gnupg2 software-properties-common git-core libgnutls28-dev libgnutls30 screen lua5.1 liblua5.1-0 liblua5.1-0-dev python-dev python-pip bzip2 zlib1g-dev unzip python-setuptools build-essential flex autoconf python-gnutls atop htop rsync dnsutils; curl -fsSL | apt-key add - && add-apt-repository "deb [arch=amd64] $(lsb_release -cs) stable"; apt update; apt -y install docker-ce; systemctl restart docker && for i in {8001..8015}; do docker run --detach --env DOWNLOADER="YOUR_USERNAME" --env SELECTED_PROJECT="auto" --env CONCURRENT_ITEMS="6" --publish $i:8001 --restart always archiveteam/warrior-dockerfile; done && htop

Cleaning Up

This is what I ran at the end on each VM:

for c in $(docker ps -aq); do echo "DOCKER: $c"; docker exec $c bash -c 'for i in $(find /data -name *.warc.gz); do rsync -rltv --timeout=300 --contimeout=300 --progress --bwlimit 0 --sockopts=SO_SNDBUF=8388608,SO_RCVBUF=8388608 --recursive --partial --partial-dir .rsync-tmp --min-size 1 --no-compress --compress-level 0 ${i} rsync://PUT_RSYNC_TARGET_URL_HERE/YOUR_USERNAME/; rm ${i}; done'; done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment