Skip to content

Instantly share code, notes, and snippets.

@wassname
Last active January 29, 2018 23:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wassname/901a5e023a9641bfb5e17549c64ba428 to your computer and use it in GitHub Desktop.
Save wassname/901a5e023a9641bfb5e17549c64ba428 to your computer and use it in GitHub Desktop.
Streamed md5 calculation for large files (with a progress bar) in python.
import os
from tqdm import tqdm
import hashlib
def md5_streamed_progress(fname):
"""Streamed md5 with progress bar."""
hash_md5 = hashlib.md5()
filesize = os.path.getsize(fname)
with tqdm(total=filesize, unit='B', unit_scale=True, miniters=1, desc=os.path.basename(fname), leave=False) as t:
chunk_size = 4096
with open(fname, "rb") as f:
for chunk in iter(lambda: f.read(chunk_size), b""):
hash_md5.update(chunk)
t.update(len(chunk))
t.update(abs(filesize-t.n))
t.close()
return hash_md5.hexdigest()
md5_streamed_progress(__file__)
@wassname
Copy link
Author

wassname commented Aug 7, 2017

Actually I'm not sure this works on large files as I've been getting md5 mismatches even when the files extract OK

@Hasimir
Copy link

Hasimir commented Jan 29, 2018

MD5 is horribly broken and deprecated, maybe try it with a different hash. While SHA1 has problems making it no longer advised for cryptographic purposes, it's probably okay for something like this without massively increasing load. Maybe try that instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment