curling home / of bank-of-anthos.xyz (running v0.2.0) has 85ms latency
master / latest manifests - curling home / has 4.5s latency running v0.2.0 on the same test clster --> 27ms latency
some code change between v0.2.0 and now is causing the latency
hypothesis is: opentelemetry tracing export
run 6/17 last commit 06dbf66d61f55b95f6c9e098b4b9585059811647 --> (after logging, before tracing) 27ms latency
6/30 (PR right before Python OT tracing) dc4fa3b4ccde4df97ee2b2ae55fe17f2f22c0bd5 --> 27 ms
7/1 (Python OT tracing commit) cf756bb45b870217c478e84ec50c1ca4f45af5e3 --> 2-5 seconds
confirmed - opentelemetry is causing ~4 extra seconds of latency to the python services.
Process: now switched back to the latest master. Removed the following 3 things from all the python services at once, then redeployed + timed the "/" (login screen) endpoint.
27ms
re-toggled to "true" to test nuances ....
Yes - the exporter is outdated. Google Cloud moved their exporters to a separate repo / package.
https://google-cloud-opentelemetry.readthedocs.io/en/latest/
- Upgrade everything OT to 13b0 (including the flaskinstrumentory), switching library names to conform to latest OT package changes / where the Cloud Ops exporter now lives on Github.
- [POSSIBLY] if Flask instrumentor continues to add multiple seconds of latency, remove it from all the python services for now, and file an OT bug.
https://opentelemetry-python.readthedocs.io/en/latest/instrumentation/flask/flask.html https://opentelemetry-python.readthedocs.io/en/latest/instrumentation/sqlalchemy/sqlalchemy.html
https://github.com/GoogleCloudPlatform/opentelemetry-operations-python https://pypi.org/project/opentelemetry-exporter-google-cloud/ https://pypi.org/project/opentelemetry-tools-google-cloud/
https://google-cloud-opentelemetry.readthedocs.io/en/stable/examples/cloud_trace_exporter/README.html https://google-cloud-opentelemetry.readthedocs.io/en/stable/examples/tools/cloud_trace_propagator/README.html
Determined that it's the GCP Cloud Trace exporter causing latency. Continuing to test by just trying to hit the frontend home (login screen):
With cloud trace export setup commented out, all 3 python services -
➜ ~ hey -n 100 -c 10 http://104.196.36.38:80
Summary:
Total: 0.9168 secs
Slowest: 0.3131 secs
Fastest: 0.0224 secs
Average: 0.0865 secs
Requests/sec: 109.0697
Total data: 658900 bytes
Size/request: 6589 bytes
With cloud trace export included in all services --
➜ ~ hey -n 100 -c 10 http://104.196.36.38:80
Summary:
Total: 53.4578 secs
Slowest: 7.4205 secs
Fastest: 3.5342 secs
Average: 5.1236 secs
Requests/sec: 1.8706