I ran some experiments with varying recordsizes, filesizes, and compression. The files were .csv, representative of a simple schema:
full_name,external_id,last_modified
'Past, Gabrielle',40605,'2006-07-09 23:17:20'
'Vachil, Corry',44277,'1996-09-05 05:12:44'
The files were all generated on an ext4 filesystem. There were three sets of five files, with 75, 100,000, and 1,000,000 rows each, resulting in the following sizes:
❯ find . -name '*small*.csv' -exec du -bc {} + | \
awk 'END {printf "%s %.2f %s\n", "Average file size:", ($1 / (NR-1) / 1024), "KiB"}'