Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
title date draft
Bugzilla 1624408 Downloading Tmx Through Curl Interrupts Mid Way
2020-06-06 21:18:35 +0200
false

About the bug

Curl can't download TMX files from pontoon when a TMX file that contains enormous number of Entries.

https://bugzilla.mozilla.org/show_bug.cgi?id=1624408

Mathjazz mentions:

https://help.heroku.com/18NDWDW0/debugging-h18-server-request-interrupted-errors-in-nodejs-applications

Analysis

time curl -o de.all-projects.tmx https://pontoon.mozilla.org/de/all-projects/de.all-projects.tmx
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 12.2M    0 12.2M    0     0   249k      0 --:--:--  0:00:50 --:--:--  212k
curl: (18) transfer closed with outstanding read data remaining

real	0m50,538s
user	0m1,262s
sys	0m1,905s

Average download time of that TMX file is 50 seconds, the operation finished abruptly with the following error:

curl: (18) transfer closed with outstanding read data remaining

Notes

TMX data is being streamed through the following layers:

Curl/WGET > Gunicorn > Heroku > Pontoon backend

Heroku

Heroku implements HTTP 1.1 and supports streaming as well. What's important, Heroku timeouts the HTTP connection when an application doesn't send any bytes during the 50-second window. Timeout window resets after every sent byte. H18 error suggest that the Pontoon's backend raises an error. =

Gunicorn

Documentation suggest to use an asynchronous worker when your application needs to stream data. Ponton uses the default Gunicorn synchronous worker. Hypothesis: Gunicorn aborts the HTTP request because of the worker_timeout setting, because downloading of a TMX worker blocks a Gunicorn worker for too long.

Pontoon

The algorithm behind the TMX download view is fairly simple. It's just an iteration over a queryset of Translation memory entries. The backend returns data to the HTTP client for every TMX entry separately. Also, the application don't raise any visible error/exception that could be related to the TMX download view.

Ideas

  • The problem isn't probably related to the Heroku's limits, because the transfer time window resets after every sent byte. Wireshark shows that the Pontoon's backend sends data constantly until it breaks the connection after around 50 seconds.
  • Currently Gunicorn users synchronous worker, which isn't the best choice for streaming-friendly applications. Switching to an asynchronous worker could potentially help
  • What's strange -> --compressed, works faster and without errors (?)
  • Maybe the TMX download view produces too many bytes and they (?)

In attempt to debug the problem, I created my own instance of Pontoon and ppopulated the database with thousands of Translation Memory entries. The issue ocurred again (which is good) and was fully reproducible. Thanks to that, I spotted the following log entry in Heroku's application logs:

2020-06-06T23:30:14.219137+00:00 app[web.1]: [2020-06-06 23:30:14 +0000] [4] [CRITICAL] WORKER TIMEOUT (pid:10)
2020-06-06T23:30:14.219184+00:00 heroku[router]: sock=backend at=error code=H18 desc="Server Request Interrupted" method=GET path="/af/test1/af.test1.tmx" host=pontoon-tmx.herokuapp.com request_id=1af27c1d-8d8c-4c4f-89c3-37af8ea1f1bb fwd="188.121.15.8" dyno=web.1 connect=0ms service=30109ms status=503 bytes= protocol=https

Solutions

  • Increase worker_timeout in the Gunicorn config
    • Helps, however it may introduce new problems.
  • Change the Gunicorn worker type to asynchronous type
    • Switch to Gevent didn't help
  • Implement buffering in the TMX download view
    • I implemented a basic buffering mechanism, but it didn't solve the issue
  • Regenerate static periodically and serve them via CDN (?)
    • That's probably the least feasible idea
  • Increase the network throughput from Heroku somehow (?)
    • Not sure how to do that and I couldn't find anything on the
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.