Skip to content

Instantly share code, notes, and snippets.

@neilkod
Created June 29, 2010 18:22
Show Gist options
  • Save neilkod/457584 to your computer and use it in GitHub Desktop.
Save neilkod/457584 to your computer and use it in GitHub Desktop.
20 Million records, about 3GB data
neil-kodners-MacBook-Pro:parsed nkodner$ time cat 2010061*.txt |awk -F\t '{print $3'}|sort|uniq -c|sort -rg > srtd.out
real 14m45.107s
user 11m14.684s
sys 0m15.476s
neil-kodners-MacBook-Pro:parsed nkodner$ time cat 2010061*.txt|wc -l
20457105
real 0m43.154s
user 0m3.525s
sys 0m4.338s
neil-kodners-MacBook-Pro:parsed nkodner$ du -sh 2010061*.txt
697M 20100617.txt
965M 20100618.txt
1.0G 20100619.txt
neil-kodners-MacBook-Pro:parsed nkodner$ time pig -x local count_by_username.pig
.... pig output snipped .....
real 12m14.122s
user 8m49.999s
sys 0m47.608s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment