Skip to content

Instantly share code, notes, and snippets.

@jason-o-matic
Created August 6, 2014 18:33
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save jason-o-matic/b899dff51ce7632b6c7b to your computer and use it in GitHub Desktop.
Save jason-o-matic/b899dff51ce7632b6c7b to your computer and use it in GitHub Desktop.
DocRaptor AWS migration
Estimated total time to execute: 2hr not including pre-replication
Ensure performance test make_doc script has production endpoint
Comment out deploys tests, DO NOT COMMIT
Setup SSH tunnel between AWS background-001 to Linode MySQL
Replicate Linode MySQL to AWS (3hr+) as docraptor-production
One-time copy non-queue Redis data
Verify AWS endpoint working
./script/service_test http://aws.docraptor.com && ./script/service_test https://aws.docraptor.com
-- Wait Till Day of Maintenance --
Tweet: Reminder: we'll be doing maintenance from 2-3pm EDT today.
Connect AWS to Linode MySQL
Fallback: (AWS BRANCH) cap production resque:stop
Verify AWS endpoint working
./script/service_test http://aws.docraptor.com && ./script/service_test https://aws.docraptor.com
Switch to old Instrumental API key in app and automation
be sure to restart instrument_server
Start continuous testing against production endpoint
Direct Linode LB to AWS ELB for 10 seconds
curl -I http://docraptor.com | grep Pass # should have one line
curl -I https://docraptor.com | grep Pass # should have one line and no ssl error
nano /opt/nginx/conf/nginx.production.lb.conf
Server on port 80:
# ensure HTTPS!!!
proxy_pass https://aws;
Server on port 443:
# ensure HTTPS!!!
proxy_pass https://aws;
/etc/init.d/lb_nginx configtest
cp /opt/nginx/conf/nginx.production.lb.conf /opt/nginx/conf/nginx.production.lb.conf.new
# OPEN background worker queue interface
/etc/init.d/lb_nginx reload
curl -I http://docraptor.com | grep Pass # should be empty
curl -I https://docraptor.com | grep Pass # should be empty and no ssl error
# Verify resque jobs in web ui
cp /opt/nginx/conf/nginx.production.lb.conf.bak /opt/nginx/conf/nginx.production.lb.conf
/etc/init.d/lb_nginx configtest
/etc/init.d/lb_nginx reload
curl -I http://docraptor.com | grep Pass # should have one line
curl -I https://docraptor.com | grep Pass # should have one line and no ssl error
Verify resque draining on AWS
Fallback: Manually move some jobs from AWS -> Linode Redis
Stop continuous testing (TODO: COULD BE MOVED LATER)
Direct Linode LB to AWS ELB permanently
cp /opt/nginx/conf/nginx.production.lb.conf.new /opt/nginx/conf/nginx.production.lb.conf
/etc/init.d/lb_nginx configtest
/etc/init.d/lb_nginx reload
curl -I http://docraptor.com | grep Pass # should be empty
curl -I https://docraptor.com | grep Pass # should be empty and no ssl error
Fallback: Same as the 10 second one above
Verify production endpoint working
./script/service_test http://docraptor.com && ./script/service_test https://docraptor.com
Verify no traffic hitting Linode app servers
Verify Linode Resque is drained
Make a new branch of aws_migration
with AWS MySQL + Port
Deploy out-of-band AWS app instance pointed at AWS MySQL
# uncomment, DO NOT COMMIT web-oob enabled!!!
cap production deploy HOSTFILTER=web-oob.docraptor.com
# recomment web-oob so we do not deploy it in the next steps
# verify connection to correct MySQL:
eb ssh web-oob.docraptor.com
/data/docraptor/current/script/rails runner 'User.last; puts `lsof -i -p #{Process.pid}`'
Deploy AWS MySQL with pause to AWS instances
cap production deploy deploy:web:enable
Clear failed jobs on Linode
-- Must Be In Maintenance Window --
Enable maintenance mode in PagerDuty
Pingdom
Tweet: We're starting maintenance now, you may see intermittent errors over the next hour.
Run dc_switch cap task
cap production dc_switch
Fallback: is automatic
Continue paused deploy
Issue?: cap production deploy:restart
Verify no passenger
curl -I http://docraptor.com | grep Pass # should be empty
curl -I https://docraptor.com | grep Pass # should be empty and no ssl error
Verify production endpoint
./script/service_test http://docraptor.com && ./script/service_test https://docraptor.com
Requeue failed jobs
Stress test (10min)
./script/performance.rb old 1000000 pdf small | tee -a tmp/performance-old-pdf-small-final-switch.log
Fallback: switch Linode LB to point to Linode apps (will lose data)
Wait a safe period (1hr?)
-- Maintenance Complete --
Disable maintenance mode in PagerDuty
Tweet: Maintenance complete! Please enjoy your regularly scheduled document service :)
Enable cloudfront
Move cron jobs from Linode to AWS
Remove out-of-band server and cleanup cap tasks
Remove deploy pauser
Enable gitflow
Switch DNS
Wait at least 48 hours
Move any outstanding temporary storage files
Shutdown linode non load balancer boxes
Possibly wait more
Verify no traffic hitting Linode load balancers for 1 day
Shutdown Linode
Remove Linode boxes from AWS security groups
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment