Skip to content

Instantly share code, notes, and snippets.

@hemanth22
Forked from john-science/gzip_files_in_python.md
Created December 1, 2020 08:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hemanth22/db3e293055cd142cc5f0dca346994213 to your computer and use it in GitHub Desktop.
Save hemanth22/db3e293055cd142cc5f0dca346994213 to your computer and use it in GitHub Desktop.
Reading & Writing GZIP Files Faster in Python

Reading & Writing GZIP Files in Python

I have been testing various ways to read and write text files with GZIP in Python. There were a lot of uninteresting results, but there were two I thought were worth sharing.

Writing GZIP files

If you have a big list of strings to write to a file, you might be tempted to do:

f = gzip.open(out_path, 'wb')
for line in lines:
    f.write(line)
f.close()

But, it turns out that it's (10-20%) faster to do:

f = gzip.open(out_path, 'wb')
try:
    f.writelines(lines)
finally:
    f.close()

Reading GZIP files

If you have a big GZIP file to read (text, not binary), you might be temped to read it like:

import gzip
f = gzip.open(in_path, 'rb')
for line in f.readlines():
    # do stuff
f.close()

But it turns out it can be up to 3 times faster to read it like:

import gzip
import io
gz = gzip.open(in_path, 'rb')
f = io.BufferedReader(gz)
     for line in f.readlines():
         # do stuff
gz.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment