Created
July 6, 2020 15:18
-
-
Save Haaroon/eea1a4fc256788de3a6f5af021a2f0a8 to your computer and use it in GitHub Desktop.
Extract from gzip , remove duplicates from csv but keep header, then gzip result
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Extract a csv file from a gzip but keep the original gzip zcat file_compressed.csv.gz | |
# cut the two columns of the file and save it cut -d',' -f1,2 > file_uncompressed.csv | |
zcat file_compressed.csv.gz | cut -d',' -f1,2 > file_uncompressed.csv | |
# Since its a csv with a header keep this header head -n1 file_uncompressed.csv | |
# but get the result of the file, sort it, uniq to | |
# remove duplicates, +2 so we start at line 2 tail -n +2 file_uncompressed.csv | sort | uniq | |
# gzip the result and save into new file gzip > file_compressed_no_dups.csv.gz | |
(head -n1 file_uncompressed.csv && tail -n +2 file_uncompressed.csv | sort | uniq) | gzip > file_compressed_no_dups.csv.gz |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment