Skip to content

Instantly share code, notes, and snippets.

@t3rmin4t0r
Created July 31, 2015 18:58
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save t3rmin4t0r/49e391eab4fbdfdc8ce1 to your computer and use it in GitHub Desktop.
Save t3rmin4t0r/49e391eab4fbdfdc8ce1 to your computer and use it in GitHub Desktop.
To Split and Gzip at the same time
#!/usr/bin/env gawk -f
BEGIN { id = 0;
cmd = "gzip -c -2";
ext = ".gz";
file = sprintf("%04d%s",id, ext);
print "Opening new file " file " at " NR " rows";
count = 1000000;
}
# Use pipes
{ print | cmd " > " file }
# Close pipe every 100k lines
NR % count == 0 {
close(cmd " > " file );
id = id + 1;
file = sprintf("%04d%s",id,ext);
print "Opening new file " file " at " NR " rows";
}
END {
print "Ending stream at " NR " rows"
# pipes are automatically closed
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment