Skip to content

Instantly share code, notes, and snippets.

@momota
Last active March 28, 2024 14:27
Show Gist options
  • Save momota/ba302f0f0720ff5b2445fb81820c5b82 to your computer and use it in GitHub Desktop.
Save momota/ba302f0f0720ff5b2445fb81820c5b82 to your computer and use it in GitHub Desktop.
Generate a large size of CSV file was filled random values. This script generates around 250MB size of the file. You can adjust two parameters `row` and `col` to generate the file which has desirable size.
import csv
import random
# 1000000 and 52 == roughly 1GB (WARNING TAKES a while, 30s+)
rows = 1000000
columns = 52
def generate_random_row(col):
a = []
l = [i]
for j in range(col):
l.append(random.random())
a.append(l)
return a
if __name__ == '__main__':
f = open('sample.csv', 'w')
w = csv.writer(f, lineterminator='\n')
for i in range(rows):
w.writerows(generate_random_row(columns))
f.close()
@AndrewFarley
Copy link

@momota Fun little script, only one issue is you have your col and row logic backwards in the generate_random_array function. The first range needs to be row, then the second range needs to be col. :)

@AndrewFarley
Copy link

AndrewFarley commented Dec 16, 2022

Also, this script isn't really written to scale, as it would require the amount of RAM of the file size you wish to write. The second you go above you'll OOM and crash. ;) I just tested, it takes about 2.5GB of RAM to write a 1GB file. :)

@AndrewFarley
Copy link

Fixed it for you. Takes almost zero memory, can go to infinitely sized files, and it's actually a little faster runtime...

root@ip-172-100-182-190:~/1tb-project# time python3 generate-random-1gb-csv-old.py
..........
real	0m39.878s
user	0m38.671s
sys	0m1.200s
root@ip-172-100-182-190:~/1tb-project# time python3 generate-random-1gb-csv.py
..........
real	0m35.726s
user	0m35.026s
sys	0m0.572s
import csv
import random

# 1000000 and 52 == roughly 1GB (WARNING TAKES a while, 30s+)
rows = 1000000
columns = 52

def generate_random_row(col):
    a = []
    l = [i]
    for j in range(col):
        l.append(random.random())
    a.append(l)
    return a

if __name__ == '__main__':
    f = open('sample.csv', 'w')
    w = csv.writer(f, lineterminator='\n')
    for i in range(rows):
        w.writerows(generate_random_row(columns))
    f.close()

@momota
Copy link
Author

momota commented Dec 17, 2022

@AndrewFarley Thanks for your comments.
This code was the first Python code I ever wrote, and I fondly remember writing it back then because I needed a large CSV file for testing.
I completely agree with the improvements you pointed out and have fixed it.
Thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment