Last active
October 5, 2017 02:41
-
-
Save jwhitlock/c9db443399607bc858dffbc5baebbe00 to your computer and use it in GitHub Desktop.
Archive of https://etherpad.mozilla.org/mdn-cloud-infra-2015, bug 1110799
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Very first discussion about potentially moving MDN to Cloud Services infrastructure in 2015 | |
Context via cyliang: | |
"I'm part of the IT WebOps team, which is separate from the infra team spearheaded by Corey (cshields). jthomas and oremj are part of a completely different operations team (Service Operations is part of Cloud Services) and reports to a different VP. Right now, MDN is part of Cloud Services, but their infrastructure and operations is currently managed by the IT WebOps group." | |
Feb 4 | |
Repository + stable parameter | |
tag, commit, branch wants to go to prod | |
* What is the value for Continuous Delivery? | |
* | |
* What is the risk of something breaking? | |
* | |
* Decisions based on the above ... | |
* Automated QA tests should block production pushes | |
Config -> environment variables | |
Apache -> gunicorn & nginx | |
Self-serving PaaS? | |
Deis: probably never run | |
AWS ElasticBeanstalk? | |
AWS Lambda? | |
AWS ECS? | |
ACTION: | |
* Luke email mdn-dev re: values & risks of Continuous Delivery | |
* Luke email Stuart re: Intern testing | |
Jan 14 | |
Cleaning up backend work; affecting deployments; want to make everyone aware of ideas for future MDN platform and how we'd like it to run on AWS | |
jezdez: | |
Mozilla's traditional deployments in IT/WebOps have been very traditional clusters of web servers, etc. | |
Current deployment process like chief is more of a macguyver/band-aid | |
Big fan of stateless web app; independent components & resources; *12 factor app* (http://12factor.net/) and Heroku; biggest advantage: clear separation of concern for better maintainability | |
Technically, would entail many refactorings in both code & deployment | |
Maris & Jannis: we are at the decision point now before we go down any given technology path | |
Travis: | |
Differences | |
* How much control over the environment each person has | |
Use Jenkins as automation and to build environments for Dev, QA, Production. E.g., go to jenkins interface and push a button to deploy a hash to AWS | |
CloudOps always pushes the production environment | |
Stateful storage for app in external resources (Redis, S3, etc.) | |
Cloud spins up new nodes and changes DNS over | |
Want soft-launch techniques | |
How to get branch to production? | |
0. Config update needed; alert CloudOps | |
1. Some trigger of nightly build - tag, branch, something | |
2. Spins up environment exactly the same as production | |
3. Go to Jenkins - parameterized submission form: branch name or revision & puppet/ansible config repo | |
Dec 3 | |
* [DECISION] Target end of Q2 move | |
* Complete move | |
* (groovecoder) file meta bug, NetApp bug, DNS ... blocked by kuma-lib | |
* NetApp -> S3 | |
* (Travis) create stage & prod S3 buckets for MDN | |
* (cturra) move MDN files to S3 buckets | |
* (mdndev) change demo & wiki code to use S3 | |
* (mdndev) send static assets to S3 | |
* DNS -> AWS Route53 :) | |
* Django web-heads -> AWS EC2 | |
* (Travis) spin up stage & prod EC2 instances | |
* KumaScript nodes -> AWS EC2 | |
* (Travis) spin up stage & prod EC2 instances | |
* RabbitMQ -> AWS Redis | |
* (Travis) spin up stage & prod instances | |
* celery node -> AWS EC2 | |
* (Travis) spin up stage & prod instances | |
* (mdndev) change code from RabbitMQ to Redis | |
* ElasticSearch -> AWS EC2 instances (custom ES cluster) | |
* (travis) create ES cluster | |
* (mdndev) re-build index on AWS ES cluster | |
* (mdndev) change code to use AWS ES cluster | |
* MySQL Database -> AWS RDS | |
* (mdndev) test read-only | |
* (IT, WebOps, mdndev, CloudOps) dump & import | |
* (mdndev) code testing | |
* Memcache -> AWS Redis | |
* [todo] NetOps + WebOps + CloudOps: Monitoring | |
* Nagios Monitoring -> drop from MOC? | |
* Sentry Monitoring -> Sentry node in CloudOps | |
* No Logging -> heka (https://mana.mozilla.org/wiki/display/CLOUDSERVICES/Logging+Standard) | |
* (groovecoder) schedule meeting w/ mdndev & CloudOps to discuss deployment | |
* (groovecoder) file bug - Product: Mozilla Services; Component: Operations assign to ckolos for AWS dev access to mdn-dev@mozilla.com | |
Nov 25 | |
* IT/WebOps still planning for 2015: questions about the benefits of shifting how Ops does deployments, and the benefit of doing this work now, or at a later time when the 2015 plan is clearer. | |
* WebOps is leading the discussion on this | |
* Q2 is preferred timing for MDN, so that we can wrap up our current project of site stability fixes (and not intermix infra changes with major code landings) | |
* Q2 is good for Service Ops, too | |
* Travis - AWS & ElasticSearch - hosted AWS or custom? | |
* 2 custom/raw 6-node ES clusters | |
* MDN currently runs 5 nodes (prod) | |
* (groovecoder) send summary email including Sean & Stephanie | |
* (groovecoder) schedule meeting for Mozlandia | |
* What happens to self-service deployments, CI infrastructure? | |
* should be no problem sticking with Travis for CI | |
* need to examine options for self-service deploys, MDN currently uses chief | |
* MDN OK with losing chief, wants to keep self-service part | |
* needs discussion at Mozlandia | |
Nov 11 | |
* Who could work on the project? | |
* Sean Rich, C Liang, Stephanie Chan | |
* When could we do it? | |
* IT/WebOps: planning 2015 quarters Nov 18th-19th | |
* MDN: Q3 services | |
* CloudOps: Prefer Q1-Q2 | |
* (groovecoder) owns the project | |
* (cyliang) will get a list of infra + 3rd party services together | |
* (groovecoder) get DB backup info from sheeri (see below) | |
* (groovecoder) will schedule a follow-up meeting for week of Nov 25th | |
Optionally discuss: | |
* List of infrastructure (web-heads, db servers, search nodes, celery nodes, rabbitmq, etc.) | |
* https://mana.mozilla.org/wiki/display/websites/developer.mozilla.org+Cluster | |
* (only update to above is that MDN ES is now on a separate set of clusters) | |
* [ no metrics ingestion of logs ] | |
* Any other IT integration points (i.e storage) | |
* /mnt/netapp for demo & wiki uploads | |
* change to S3 (will need dev work) | |
* List of 3rd-party services (socketlabs, etc.) | |
* SocketLabs | |
* NewRelic | |
* Recaptcha? | |
* Bitly? | |
* Monitoring | |
* Nagios | |
* collectd / graphite(move to statsd/heka/graphite) | |
* errormill (sentry) | |
* Database Backup Retention | |
* Sheeri | |
* daily backups kept for a month | |
* monthly backups (1st of the month) kept since 7/1/2014 | |
* CI/CD infrastructure | |
* Switch to AWS + Jenkins? | |
Notes: | |
* MDN will(?) be standing up new services in 2015, but they don't depend on the wiki | |
* new services can go to CSO, MDN can stay with IT | |
* probably start standing these up for public consumption in Q3 | |
* travis has some directive to assist with MDN Ops management | |
* if MDN moves, best to move the infra as well - just moving management + processes won't be any gain | |
* |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment