Skip to content

Instantly share code, notes, and snippets.

@rileywilddog
Last active January 3, 2019 06:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rileywilddog/bfab8404190ba5ac428c8de3c75dacb1 to your computer and use it in GitHub Desktop.
Save rileywilddog/bfab8404190ba5ac428c8de3c75dacb1 to your computer and use it in GitHub Desktop.
Running ArchiveTeam tumblr-grab on GCP

Running ArchiveTeam tumblr-grab on GCP

Guide by riley (efnet, @bad.pet)

To use other projects, see Running ArchiveTeam Warrior on GCP Based on this

Create a GCP project

If necessary. https://console.cloud.google.com/projectcreate
Then select the project you want to use

Create a new instance template

https://console.cloud.google.com/compute/instanceTemplates/add
name: archiveteam-tumblr-grab
type: micro
disk: 10gb debian 9
Expand "Management, security, disks, networking, sole tenancy" section
Security -> SSH Keys -> paste in your key
Networking -> Network service tier, if you like
Management -> Startup script (edit the variables!):

#!/bin/bash

DOWNLOADER="awoobis-unconfigured" # your handle for the scoreboard
CONCURRENCY="1" # simultaneous items to run PER INSTANCE
PIPELINE_ARGS="" # If needed, eg. "--context-value bind_address=123.4.5.6"
SWAPSIZE="512M"

# A bit of swap for comfort
fallocate -l "$SWAPSIZE" /swap
chmod 600 /swap
mkswap /swap
echo '/swap swap swap defaults 0 0' >>/etc/fstab
swapon /swap

# Set up tumblr-grab
apt-get update
apt-get upgrade -y
apt-get install -y tmux python-pip git liblua5.1-0
pip install --upgrade seesaw
adduser --system --group --shell /bin/bash archiveteam
sudo -u archiveteam bash -c "cd /home/archiveteam;git clone https://github.com/ArchiveTeam/tumblr-grab.git"

# To build your own wget-lua
#apt-get install -y git-core autoconf libgnutls28-dev liblua5.1-0-dev flex
#sudo -u archiveteam bash -c "cd ~/tumblr-grab;./get-wget-lua.sh"

# Install GCP monitoring agent
curl -sS https://dl.google.com/cloudagents/install-monitoring-agent.sh | bash &

# tumblr-monitor
wget https://gist.github.com/JustAnotherArchivist/f4617c902626377532692a341794f273/raw/4a81f66b5dcbc18deb0d530979a443be12b1844a/tumblr-monitor -O /home/archiveteam/tumblr-monitor
chmod +x /home/archiveteam/tumblr-monitor

sudo -i -u archiveteam tmux new-session -d -s tumblr-grab \
    "cd /home/archiveteam/tumblr-grab/;run-pipeline pipeline.py --concurrent $CONCURRENCY $PIPELINE_ARGS $DOWNLOADER"
echo "@reboot tmux new-session -d -s tumblr-grab \
    'cd /home/archiveteam/tumblr-grab/;run-pipeline pipeline.py --concurrent $CONCURRENCY $PIPELINE_ARGS $DOWNLOADER'" \
    | crontab -u archiveteam -

Then start deploying!

https://console.cloud.google.com/compute/instancesAdd?templateName=archiveteam-tumblr-grab
It'll take several minutes for setup to finish, but eventually you should be able to run /home/archiveteam/tumblr-monitor to check the status.

To stop

For a nice shutdown (may take hours/days), either attach and Ctrl+C once or run this on each VM, then wait for the program to exit:

install -o archiveteam /dev/null /home/archiveteam/tumblr-grab/STOP

To kill it, either attach and Ctrl+C twice or just kill the VMs from https://console.cloud.google.com/compute/instances

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment