Skip to content

Instantly share code, notes, and snippets.

@jwhitlock
Last active October 5, 2017 02:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jwhitlock/c9db443399607bc858dffbc5baebbe00 to your computer and use it in GitHub Desktop.
Save jwhitlock/c9db443399607bc858dffbc5baebbe00 to your computer and use it in GitHub Desktop.
Very first discussion about potentially moving MDN to Cloud Services infrastructure in 2015
Context via cyliang:
"I'm part of the IT WebOps team, which is separate from the infra team spearheaded by Corey (cshields). jthomas and oremj are part of a completely different operations team (Service Operations is part of Cloud Services) and reports to a different VP. Right now, MDN is part of Cloud Services, but their infrastructure and operations is currently managed by the IT WebOps group."
Feb 4
Repository + stable parameter
tag, commit, branch wants to go to prod
* What is the value for Continuous Delivery?
*
* What is the risk of something breaking?
*
* Decisions based on the above ...
* Automated QA tests should block production pushes
Config -> environment variables
Apache -> gunicorn & nginx
Self-serving PaaS?
Deis: probably never run
AWS ElasticBeanstalk?
AWS Lambda?
AWS ECS?
ACTION:
* Luke email mdn-dev re: values & risks of Continuous Delivery
* Luke email Stuart re: Intern testing
Jan 14
Cleaning up backend work; affecting deployments; want to make everyone aware of ideas for future MDN platform and how we'd like it to run on AWS
jezdez:
Mozilla's traditional deployments in IT/WebOps have been very traditional clusters of web servers, etc.
Current deployment process like chief is more of a macguyver/band-aid
Big fan of stateless web app; independent components & resources; *12 factor app* (http://12factor.net/) and Heroku; biggest advantage: clear separation of concern for better maintainability
Technically, would entail many refactorings in both code & deployment
Maris & Jannis: we are at the decision point now before we go down any given technology path
Travis:
Differences
* How much control over the environment each person has
Use Jenkins as automation and to build environments for Dev, QA, Production. E.g., go to jenkins interface and push a button to deploy a hash to AWS
CloudOps always pushes the production environment
Stateful storage for app in external resources (Redis, S3, etc.)
Cloud spins up new nodes and changes DNS over
Want soft-launch techniques
How to get branch to production?
0. Config update needed; alert CloudOps
1. Some trigger of nightly build - tag, branch, something
2. Spins up environment exactly the same as production
3. Go to Jenkins - parameterized submission form: branch name or revision & puppet/ansible config repo
Dec 3
* [DECISION] Target end of Q2 move
* Complete move
* (groovecoder) file meta bug, NetApp bug, DNS ... blocked by kuma-lib
* NetApp -> S3
* (Travis) create stage & prod S3 buckets for MDN
* (cturra) move MDN files to S3 buckets
* (mdndev) change demo & wiki code to use S3
* (mdndev) send static assets to S3
* DNS -> AWS Route53 :)
* Django web-heads -> AWS EC2
* (Travis) spin up stage & prod EC2 instances
* KumaScript nodes -> AWS EC2
* (Travis) spin up stage & prod EC2 instances
* RabbitMQ -> AWS Redis
* (Travis) spin up stage & prod instances
* celery node -> AWS EC2
* (Travis) spin up stage & prod instances
* (mdndev) change code from RabbitMQ to Redis
* ElasticSearch -> AWS EC2 instances (custom ES cluster)
* (travis) create ES cluster
* (mdndev) re-build index on AWS ES cluster
* (mdndev) change code to use AWS ES cluster
* MySQL Database -> AWS RDS
* (mdndev) test read-only
* (IT, WebOps, mdndev, CloudOps) dump & import
* (mdndev) code testing
* Memcache -> AWS Redis
* [todo] NetOps + WebOps + CloudOps: Monitoring
* Nagios Monitoring -> drop from MOC?
* Sentry Monitoring -> Sentry node in CloudOps
* No Logging -> heka (https://mana.mozilla.org/wiki/display/CLOUDSERVICES/Logging+Standard)
* (groovecoder) schedule meeting w/ mdndev & CloudOps to discuss deployment
* (groovecoder) file bug - Product: Mozilla Services; Component: Operations assign to ckolos for AWS dev access to mdn-dev@mozilla.com
Nov 25
* IT/WebOps still planning for 2015: questions about the benefits of shifting how Ops does deployments, and the benefit of doing this work now, or at a later time when the 2015 plan is clearer.
* WebOps is leading the discussion on this
* Q2 is preferred timing for MDN, so that we can wrap up our current project of site stability fixes (and not intermix infra changes with major code landings)
* Q2 is good for Service Ops, too
* Travis - AWS & ElasticSearch - hosted AWS or custom?
* 2 custom/raw 6-node ES clusters
* MDN currently runs 5 nodes (prod)
* (groovecoder) send summary email including Sean & Stephanie
* (groovecoder) schedule meeting for Mozlandia
* What happens to self-service deployments, CI infrastructure?
* should be no problem sticking with Travis for CI
* need to examine options for self-service deploys, MDN currently uses chief
* MDN OK with losing chief, wants to keep self-service part
* needs discussion at Mozlandia
Nov 11
* Who could work on the project?
* Sean Rich, C Liang, Stephanie Chan
* When could we do it?
* IT/WebOps: planning 2015 quarters Nov 18th-19th
* MDN: Q3 services
* CloudOps: Prefer Q1-Q2
* (groovecoder) owns the project
* (cyliang) will get a list of infra + 3rd party services together
* (groovecoder) get DB backup info from sheeri (see below)
* (groovecoder) will schedule a follow-up meeting for week of Nov 25th
Optionally discuss:
* List of infrastructure (web-heads, db servers, search nodes, celery nodes, rabbitmq, etc.)
* https://mana.mozilla.org/wiki/display/websites/developer.mozilla.org+Cluster
* (only update to above is that MDN ES is now on a separate set of clusters)
* [ no metrics ingestion of logs ]
* Any other IT integration points (i.e storage)
* /mnt/netapp for demo & wiki uploads
* change to S3 (will need dev work)
* List of 3rd-party services (socketlabs, etc.)
* SocketLabs
* NewRelic
* Recaptcha?
* Bitly?
* Monitoring
* Nagios
* collectd / graphite(move to statsd/heka/graphite)
* errormill (sentry)
* Database Backup Retention
* Sheeri
* daily backups kept for a month
* monthly backups (1st of the month) kept since 7/1/2014
* CI/CD infrastructure
* Switch to AWS + Jenkins?
Notes:
* MDN will(?) be standing up new services in 2015, but they don't depend on the wiki
* new services can go to CSO, MDN can stay with IT
* probably start standing these up for public consumption in Q3
* travis has some directive to assist with MDN Ops management
* if MDN moves, best to move the infra as well - just moving management + processes won't be any gain
*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment