Skip to content

Instantly share code, notes, and snippets.

@mikecroucher
Forked from drj11/accounting.md
Last active July 13, 2016 13:43
Show Gist options
  • Save mikecroucher/8fee706b0748836ab437214c9fbbe099 to your computer and use it in GitHub Desktop.
Save mikecroucher/8fee706b0748836ab437214c9fbbe099 to your computer and use it in GitHub Desktop.

Average CPU job: 4900 seconds

$ awk -F: '{print $35, $37}' 1percent | awk '{a+=$2};END{print a/NR}'
4852.23

Average number of cores, per CPU second

awk -F: '{print $35, $37}' 1percent | awk '{p+=$1*$2;a+=$2}; END {print p/a}'
34.9133

Average over whole dataset:

awk -F: '{print $35, $37}' accounting | awk '{p+=$1*$2;a+=$2};END{print p/a}'
37.7884

CPU counts per user

awk -F: '{cpu[$4]+=$37};END{for(a in cpu){printf "%.0f %s\n",cpu[a],a}}' 1percent | sort -n

Everything since 2015-01-01

awk -F: '$9 >= 1420070400' accounting > full-2015

Table of (slots, CPU-seconds)

awk -F: 'BEGIN{OFMT="%.0f";Slots=35;Wall=14};{t[$Slots]+=$Wall*$Slots};END{for(n in t) { print n, t[n] } }' full-2015 > slot-cpu

Number of users who have only ever run 1-slot jobs

gawk -F: '{print $4,$35}' 1percent | sort | uniq | gawk '{slots[$1]+=$2};END{for(n in slots) { print n, slots[n] }}' | gawk '$2==1 {print $1}' | wc

I'm new to this so here's my thinking. How might I do it better?

#lists users along with one example of the number slots they've requested. e.g.
#foouser 1
#foouser 2
#foouser 4
gawk -F: '{print $4,$35}' 1percent | sort | uniq

Add them up for each user.
gawk '{slots[$1]+=$2};END{for(n in slots) { print n, slots[n] }}'

Pull out the users where the sum is equal to one
gawk '$2==1 {print $1}

The full accounting file gives 2276 users out of a total 3024 using this method. That is, 75% of all users have never submitted a multicore job!

Another way of doing this

# Get the maximum number of slots ever requested by each user
gawk -F: '{$35>=slots[$4]} {slots[$4]=$35};END{for(n in slots){print n, slots[n]}}' 1percent

#Only print out the ones that match one core
gawk -F: '{$35>=slots[$4]} {slots[$4]=$35};END{for(n in slots){if(slots[n]==1){print n, slots[n]}}}' accounting.csv | wc
2842    5684   28146

Different numbers. So, I've messed up. How?

I put {} around the pattern. The following method agrees with my first attempt:

gawk -F: '$35>=slots[$4] {slots[$4]=$35};END{for(n in slots){if(slots[n]==1){print n, slots[n]}}}' accounting.csv | wc
    2276    4552   22632

How about for only the jobs since 2015? Number of total users is 1172:

cut -d ':' -f 4 full-2015 | sort | uniq|wc
    1172    1172    9131

Number of users who've never submitted a multicore job are 782

gawk -F: '$35>=slots[$4] {slots[$4]=$35};END{for(n in slots){if(slots[n]==1){print n, slots[n]}}}' full-2015 | wc
     782    1564    7696
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment