In an IDA world we need to think about heartbeats differently. ELB heartbeats should not result in the whole program faliing just because one of the services the app needs is unreachable.
-
Health check should fail only if the full IDA is un-usable
-
It should otherwise indicate problems on the health page but continue to do what we can.
-
Will we need to monitor differently? Want to know when things are sick not just when they're dead.
-
TEST/PLAT: How are we testing short-circuting right now? Any integration tests?
-
Metrics and Logging
- Errors
- Sentry tool for errors - Newrelic does the same sort of thing.
- Grafana + Graphite or InfluxDB <- New tools
- Errors
Talk Idea: Thinking Operationally
- Operational Questions to ask when deploying Django
- Look at the questions I asked about the badging app: https://gist.github.com/feanil/0804cc058b9adce76b90
Runscope API Monitoring Tool: https://www.runscope.com/
Notes from Frank Strattons Talk about Microservices
-
Smart Client - Services Discovery + Wrapper for Requests Library
- Service URLs -> Actual URLs
- DynamoDB + ZooKeeper
-
Use a HAProxy configured on each machine via "Atlas" to point to various physical locations
- HA Proxy ACLs(Need to learn more about HAProxy)
-
Each service built in Flask w/FlaskRestful
- Add healthchecks to flask
- bugsnag
- runscope-daemon
- alchemist
- smart-config
- Alternative to settings.py file with all the settings on disk
-
Lessons Learned from Lots of Services
- Need lots of automation, automation needs to be easier
- You become Language Agnostic since everything has an API
- No Shared Databases
- Make Deploying Code Easier
- 1-Click Deploys
- They built a service called Prometheus
- **For faster deploys we need to build images faster **
-
PIP and Wheels
- We can upload wheels to PYPI
- For our repos maybe we could maybe build wheels as a part of the build - https://github.com/ogrisel/wheelhouse-uploader
- Upload to S3
- Use wheelhouse uploader to pull down artifact for a tag of the repo - Probably with Jenkins.
- Upload the code, and wheels to PyPI - Probably with Jenkins when new tags are created.
- Run the test from the pip installed version
- 'language: objective-c' will run tarvis on an osx machine
- We can test with multiple OS with travis
** Tool Idea **
- Webhooks Middle-Man
- Our Jenkins is behind a firewall, we don't want to expose the full server just for one url endpoint
- A middleman service could live outside the firewall to receive the webhooks
- Pass them to the Jenkins which could poll for a list of new hook events since the last time you asked
- Stretch:
- A sidecar on our admin server that would connect via a websocket to the middleman
- The middleman can push to the sidecar which could hit the webhook locally.
- A sidecar on our admin server that would connect via a websocket to the middleman
- Checkout WebhookDB: https://github.com/singingwolfboy/webhookdb
- Unix Pipe Friendly commandline tools for bringing up servers
- eg. context --tags env=prod app=edxapp | create-instances -n 2 --image=ami-... | add-to-lb
- Seems kind of cool, kind of dumb
- each tool is a python script that does one thing well
- context would be passed at the end of each call that may reduce or increase the scope of the context that can be used by the next call
We need to move to Python3
Talks to watch Later:
-
Super is super
-
Beyond PEP8
-
We need a skeleton app for new services to use as we get More IDAs