Skip to content

Instantly share code, notes, and snippets.

@Lawrence-Krukrubo
Created October 4, 2020 11:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Lawrence-Krukrubo/b6d250042be0414ba0ef168e3b866276 to your computer and use it in GitHub Desktop.
Save Lawrence-Krukrubo/b6d250042be0414ba0ef168e3b866276 to your computer and use it in GitHub Desktop.
# Initialise a variable to compute average bytes per chunk
ave_bytes = 0
# then we initialise our loop counter
count = 0
# This enumerate function selects repeated chunks of 1,000,000 rows of data
for index, chunk in enumerate(pd.read_csv('ratings.csv', chunksize= 1000000),start=1):
# We add total memory per chunk to ave_bytes
ave_bytes += chunk.memory_usage().sum()
# This inner loop iterates through the rate keys only.Then it does
# vectorised selections on the dataframe to select count of each rate key.
for i in rate_keys:
count = len(chunk[chunk['rating'] == i])
ratings_dict[i] += count
print("Total number of chunks:",index)
ave_bytes = ave_bytes / index
print("Average bytes per loop:",ave_bytes)
print(ratings_dict)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment