Skip to content

Instantly share code, notes, and snippets.

@jhannah
Last active January 30, 2017 17:57
Show Gist options
  • Save jhannah/af570f2bee587835a28d086cd99cbc27 to your computer and use it in GitHub Desktop.
Save jhannah/af570f2bee587835a28d086cd99cbc27 to your computer and use it in GitHub Desktop.
how does this file get 6 times smaller, same # of lines?
$ ls -al out.txt
-rw-r--r-- 1 jhannah staff 14340349 Jan 30 12:33 out.txt
$ wc -l out.txt
554 out.txt
$ sort out.txt | uniq > out.unique.txt
$ ls -al out.unique.txt
-rw-r--r-- 1 jhannah staff 2857273 Jan 30 12:42 out.unique.txt
$ wc -l out.unique.txt
554 out.unique.txt
WHOAH solved (sort of)... sort or uniq is chopping long lines!
e.g. a line with 33058 characters is now only 8190 lines long.
MacOS 10.12.2
Some of these lines are doozies
$ perl -E 'say length($_)' -n out.txt | sort -rn | head
345943
131963
131963
131963
131963
131963
131963
131963
131963
131963
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment