-
-
Save Lawrence-Krukrubo/b6d250042be0414ba0ef168e3b866276 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Initialise a variable to compute average bytes per chunk | |
ave_bytes = 0 | |
# then we initialise our loop counter | |
count = 0 | |
# This enumerate function selects repeated chunks of 1,000,000 rows of data | |
for index, chunk in enumerate(pd.read_csv('ratings.csv', chunksize= 1000000),start=1): | |
# We add total memory per chunk to ave_bytes | |
ave_bytes += chunk.memory_usage().sum() | |
# This inner loop iterates through the rate keys only.Then it does | |
# vectorised selections on the dataframe to select count of each rate key. | |
for i in rate_keys: | |
count = len(chunk[chunk['rating'] == i]) | |
ratings_dict[i] += count | |
print("Total number of chunks:",index) | |
ave_bytes = ave_bytes / index | |
print("Average bytes per loop:",ave_bytes) | |
print(ratings_dict) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment