imjasonh/cloud-run-migration.md

## cloud-run-migration.md

      
    Raw
  

              cloud-run-migration.md
            
          
    GCPing is a simple single-page website where you can find out the relative
latency between your browser and multiple GCP regions. GCP manages data centers
around the world, but which one is closest to you, or more importantly, your
customers? GCPing can answer that.
GCPing is a 20% project, and for the four years since its inception it was
backed by small f1-micro GCE VMs in each region, with a public IP address, and
a static HTML and JavaScript frontend served from GCS.
Over time, new regions would come online, or basic maintenance and upgrades
would be necessary, but for the most part GCPing ran itself.
A global load-balancer was added to demonstrate how the LB can direct
traffic to the nearest regional backend. The server was containerized to run on
Container-Optimized OS for simpler auto-updating VMs.
But as features were added, the script to deploy GCPing got larger and more
complex. As a 20% project, there's often not a lot of time to fix issues or
simplify things. With more and more regions, that meant more VMs to run, and
while they're not expensive, it started to cost $250-300 per month to run all
those VMs!
Over the last four years, computing at GCP has changed. In addition to GCE VMs,
we now have Cloud Run, a serverless container-based platform. Deploying
containers to Cloud Run is simple, and Cloud Run services can run in nearly
every region GCP supports. [[[ and the rest are coming soon? ]]]
So in November 2020, I decided to rewrite GCPing as a serverless application on
Cloud Run.
Porting the server container from COS to Cloud Run was simple, I didn't even
really need to change my code (what little of it there was). The main change
was rewriting the deployment process. The old deployment process was a hairy
bash script that invoked gcloud to set up resources. If some step failed, it
could leave the site in an inconsistent state that I'd have to take time to
debug.
Cloud Run's Terraform support meant I could replace hundreds of lines of bash
with a couple dozen declarative config stanzas. Deploying the site is as easy
as running terraform apply, which presents a diff between the current state
of the world and the state I've declared locally, and a prompt to apply my
changes in the correct order.
Another bonus was HTTPS -- Cloud Run services are always served over HTTPS,
with a unique URL for each service. The GCE VM approach had every VM with a
public IP address serving plaintext HTTP.
Easier Deployment

Cost Savings:q

HTTPS Everywhere!


Previous Architecture


static HTML+JS frontend served from GCS
setup.sh script:

ensured an f1-micro VM in each region with static IP
created global LB with static IP
generated config.js to list regions->IPs for JS frontend


Originally Ubuntu VMs that ran a Go binary, later a COS VM deployed using
gcloud compute create-with-container and a container image built using ko.
Cost Breakdown

Problems


cost! $200-250/month
serving from GCS didn't support HTTPS unless I put it behind a load balancer ($$$)
serving from GCE didn't support HTTPS unless I put it behind a load balancer ($$$)
absolute lowest latency isn't really the goal -> use a CDN!

relative latency answers "what's closest"


deployment script was very imperative, occasionally required hand-tuning
create-with-container is a gcloud feature, not reproducible in APIs or
things like Terraform
running a full VM 24/7 presents a potential security issue

New Architecture


static HTML+JS frontend served by Cloud Run services
container image built using ko (no change)
Terraform config:

ensures container deployed to Cloud Run all regions
ensures global LB across regional serverless network endpoint groups (NEGs)
global LB w/ managed SSL cert for https://global.gcping.com and https://gcping.com


Intermediate Architecture


global LB for each regional Cloud Run service (https://asia-east1.gcping.com)

unnecessary cost for vanity, Cloud Run services already get a stable domain
name w/ Google-managed SSL certs


Advantages


cost! ~$20/month, depending on usage
HTTPS everywhere!
traffic is low and very bursty -> don't pay while nobody's using it
declarative deployment w/ Terraform (create-or-update w/ diff + confirm)

Future Work


automate reconciliation w/ GitHub Actions ("GitOps")
possibly automate turning up new regions? At least notify so I can do it
manually...
budget alerts in case of traffic spikes