dherman/unpack-fetch-progress.md

## unpack-fetch-progress.md

      
    Raw
  

              unpack-fetch-progress.md
            
          
We want to provide good quality progress meters at console for fetch+unpack operation
Measuring progress smoothly requires knowing not just how many compressed bytes you have read but how many decompressed bytes you have written
Knowing the percentage complete of decompressed bytes written requires knowing the total decompressed size
Knowing the total decompressed size requires reading the field of the gzip layout that indicates decompressed size
That field is at a fixed offset from the end of the gzip file
In order to still get benefit of streaming we have to make a separate HTTP HEAD request to find the content length of the file and then a subsequent GET request to fetch just that tiny byte range to read that one field
(This does have overhead of a couple extra HTTP requests — anecdotally this seems cheap enough not to matter but maybe could matter in some environments?)
Unfortunately for GitHub releases, the files redirect to S3 URLs which seem to reject HEAD requests with a 403
We created a GH repo with the files in the repo directly instead of in GH release URLs, which meant we could do the HEAD requests successfully
But long term, there are a few questions:


Is it true that progress reporting is importantly smoother when reporting based on decompressed size? My experience suggests yes but it seems worth more empirical investigation (or domain knowledge from someone who knows better than me)
Is it not worth the trade off of the extra up front requests? (3 round trips instead of 1)
Is there any possible way to do this well for zip files on windows? I think the answer is no and we just have to use compressed size
Should we just back off gracefully to compressed size when the HEAD request fails?
Or is there some way to get an S3 URL to respond successfully to a HEAD request?