Last active
January 29, 2018 23:26
-
-
Save wassname/901a5e023a9641bfb5e17549c64ba428 to your computer and use it in GitHub Desktop.
Streamed md5 calculation for large files (with a progress bar) in python.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
from tqdm import tqdm | |
import hashlib | |
def md5_streamed_progress(fname): | |
"""Streamed md5 with progress bar.""" | |
hash_md5 = hashlib.md5() | |
filesize = os.path.getsize(fname) | |
with tqdm(total=filesize, unit='B', unit_scale=True, miniters=1, desc=os.path.basename(fname), leave=False) as t: | |
chunk_size = 4096 | |
with open(fname, "rb") as f: | |
for chunk in iter(lambda: f.read(chunk_size), b""): | |
hash_md5.update(chunk) | |
t.update(len(chunk)) | |
t.update(abs(filesize-t.n)) | |
t.close() | |
return hash_md5.hexdigest() | |
md5_streamed_progress(__file__) |
MD5 is horribly broken and deprecated, maybe try it with a different hash. While SHA1 has problems making it no longer advised for cryptographic purposes, it's probably okay for something like this without massively increasing load. Maybe try that instead.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Actually I'm not sure this works on large files as I've been getting md5 mismatches even when the files extract OK