Skip to content

Instantly share code, notes, and snippets.

@shashanksingh28
Created September 17, 2019 01:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save shashanksingh28/72e0bfde91eee9d6c153f7dae379aff1 to your computer and use it in GitHub Desktop.
Save shashanksingh28/72e0bfde91eee9d6c153f7dae379aff1 to your computer and use it in GitHub Desktop.
md5sum in sklearn memory profiling
(venv) shashank@precision-7510:~/scikit-learn/scikit-learn$ cat md5_mem_test.py
import sys
from hashlib import md5
import requests
from memory_profiler import profile
from sklearn.datasets import fetch_openml
@profile(precision=3)
def hash_block():
stream = requests.get("https://www.openml.org/data/download/52667/mnist_784.arff").content
return md5(stream).hexdigest()
@profile(precision=3)
def hash_chunked(chunk_size=512):
md5sum = md5()
stream = requests.get("https://www.openml.org/data/download/52667/mnist_784.arff").content
for i in range(0, len(stream), chunk_size):
md5sum.update(stream[i : i + chunk_size])
return md5sum.hexdigest()
if __name__ == "__main__":
if len(sys.argv) > 1:
print(hash_chunked())
else:
print(hash_block())
(venv) shashank@precision-7510:~/scikit-learn/scikit-learn$ python -m memory_profiler md5_mem_test.py
Filename: md5_mem_test.py
Line # Mem usage Increment Line Contents
================================================
10 65.137 MiB 65.137 MiB @profile(precision=3)
11 def hash_block():
12 311.012 MiB 245.875 MiB stream = requests.get("https://www.openml.org/data/download/52667/mnist_784.arff").content
13 311.012 MiB 0.000 MiB return md5(stream).hexdigest()
0298d579eb1b86163de7723944c7e495
(venv) shashank@precision-7510:~/scikit-learn/scikit-learn$ python -m memory_profiler md5_mem_test.py chunked
Filename: md5_mem_test.py
Line # Mem usage Increment Line Contents
================================================
16 65.645 MiB 65.645 MiB @profile(precision=3)
17 def hash_chunked(chunk_size=512):
18 65.645 MiB 0.000 MiB md5sum = md5()
19 310.742 MiB 245.098 MiB stream = requests.get("https://www.openml.org/data/download/52667/mnist_784.arff").content
20 310.742 MiB 0.000 MiB for i in range(0, len(stream), chunk_size):
21 310.742 MiB 0.000 MiB md5sum.update(stream[i : i + chunk_size])
22 310.742 MiB 0.000 MiB return md5sum.hexdigest()
0298d579eb1b86163de7723944c7e495
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment