Skip to content

Instantly share code, notes, and snippets.

@AndrewFarley
Last active April 24, 2024 16:42
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save AndrewFarley/6643084316e06e35359782f473c701b9 to your computer and use it in GitHub Desktop.
Save AndrewFarley/6643084316e06e35359782f473c701b9 to your computer and use it in GitHub Desktop.
This simple Python file generates a random massive CSV file efficiently taking almost no RAM while doing so streaming data into your CSV file. I've pre-done the calculation of the rows/columns to the file-size, so you can easily add or remove zeroes from "rows" variable to increase or decrease the size of the file generated
import csv
import random
# 1000000 and 52 == roughly 1GB (WARNING TAKES a while, 30s+)
rows = 1000000
columns = 52
print_after_rows = 100000
def generate_random_row(col):
a = []
l = [i]
for j in range(col):
l.append(random.random())
a.append(l)
return a
if __name__ == '__main__':
f = open('sample.csv', 'w')
w = csv.writer(f, lineterminator='\n')
for i in range(rows):
if i % print_after_rows == 0:
print(".", end="", flush=True)
w.writerows(generate_random_row(columns))
f.close()
@jlee9595
Copy link

Super useful thank you!

@prem-chand-yadav
Copy link

prem-chand-yadav commented Aug 5, 2023

Super logic, I generated around 10 gb of data without any issue. Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment