Skip to content

Instantly share code, notes, and snippets.

@robertchong
Created April 19, 2014 13:24
Show Gist options
  • Save robertchong/11084326 to your computer and use it in GitHub Desktop.
Save robertchong/11084326 to your computer and use it in GitHub Desktop.
GROUP BY clause functionality in awk - bash
#!/bin/bash
#
# This shell script demonstrates how to implement group by & aggregate functions in awk
# See http://www.unixcl.com/2008/09/group-by-clause-functionality-in-awk.html
#
# Input: A colon (:) delimited file tabulating Continents and numbers
# Output: Continents and values returned by aggregate functions Count(*), Total, Average
#
if [ $# -ne 1 ]; then
echo "Usage: $0 <filename>"
exit 1
fi
inputFile=$1
awk 'BEGIN{FS=":"; print "continent count total avg"} NR!=1 {a[$1]++;b[$1]=b[$1]+$2}END{for (i in a) printf("%s %10.0f %10.0f %10.2f\n", i, a[i], b[i], b[i]/a[i])} ' $inputFile
@robertchong
Copy link
Author

Sample Usage:

$ cat continents_data.txt
continent:mval
SA:2345
AF:123
SA:89
OC:890
EU:24
AF:90
NA:5678
AF:345
OC:90
OC:23
SA:1234
EU:90
AF:12
SA:909

$ ./awk_groupby_count_total_avg.sh continents_data.txt
continent count total avg
OC          3       1003     334.33
EU          2        114      57.00
NA          1       5678    5678.00
AF          4        570     142.50
SA          4       4577    1144.25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment