Skip to content

Instantly share code, notes, and snippets.

@cccntu
Created February 8, 2021 11:58
Show Gist options
  • Star 11 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save cccntu/65eeab553536348cf6e20f93c61ac473 to your computer and use it in GitHub Desktop.
Save cccntu/65eeab553536348cf6e20f93c61ac473 to your computer and use it in GitHub Desktop.
python mmap to concatenate csv files
❯ rm out.csv
❯ cat 1.py
from glob import glob
import mmap
files = glob("data/*")
files.sort(key=lambda x: int(x.split("/")[-1].split(".")[0]))
write_f = open("out.csv", "w+b")
for i, fname in enumerate(files):
with open(fname, "r+b") as f:
with mmap.mmap(f.fileno(), 0) as mm:
if i == 0:
write_f.write(mm.readline())
else:
mm.readline()
write_f.write(mm.read())
write_f.close()
❯ time python 1.py
python 1.py 0.90s user 1.12s system 99% cpu 2.022 total
❯ wc -l out.csv
10000001 out.csv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment