Skip to content

Instantly share code, notes, and snippets.

@momota
Last active March 28, 2024 14:27
Show Gist options
  • Save momota/ba302f0f0720ff5b2445fb81820c5b82 to your computer and use it in GitHub Desktop.
Save momota/ba302f0f0720ff5b2445fb81820c5b82 to your computer and use it in GitHub Desktop.
Generate a large size of CSV file was filled random values. This script generates around 250MB size of the file. You can adjust two parameters `row` and `col` to generate the file which has desirable size.
import csv
import random
# 1000000 and 52 == roughly 1GB (WARNING TAKES a while, 30s+)
rows = 1000000
columns = 52
def generate_random_row(col):
a = []
l = [i]
for j in range(col):
l.append(random.random())
a.append(l)
return a
if __name__ == '__main__':
f = open('sample.csv', 'w')
w = csv.writer(f, lineterminator='\n')
for i in range(rows):
w.writerows(generate_random_row(columns))
f.close()
@AndrewFarley
Copy link

Fixed it for you. Takes almost zero memory, can go to infinitely sized files, and it's actually a little faster runtime...

root@ip-172-100-182-190:~/1tb-project# time python3 generate-random-1gb-csv-old.py
..........
real	0m39.878s
user	0m38.671s
sys	0m1.200s
root@ip-172-100-182-190:~/1tb-project# time python3 generate-random-1gb-csv.py
..........
real	0m35.726s
user	0m35.026s
sys	0m0.572s
import csv
import random

# 1000000 and 52 == roughly 1GB (WARNING TAKES a while, 30s+)
rows = 1000000
columns = 52

def generate_random_row(col):
    a = []
    l = [i]
    for j in range(col):
        l.append(random.random())
    a.append(l)
    return a

if __name__ == '__main__':
    f = open('sample.csv', 'w')
    w = csv.writer(f, lineterminator='\n')
    for i in range(rows):
        w.writerows(generate_random_row(columns))
    f.close()

@momota
Copy link
Author

momota commented Dec 17, 2022

@AndrewFarley Thanks for your comments.
This code was the first Python code I ever wrote, and I fondly remember writing it back then because I needed a large CSV file for testing.
I completely agree with the improvements you pointed out and have fixed it.
Thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment