Skip to content

Instantly share code, notes, and snippets.

@shitchell
Last active April 22, 2024 19:56
Show Gist options
  • Star 20 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save shitchell/783cc8a892ed1591eca2afeb65e8720a to your computer and use it in GitHub Desktop.
Save shitchell/783cc8a892ed1591eca2afeb65e8720a to your computer and use it in GitHub Desktop.
Show user stats in a git repo

Usage

$ git [git options] user-stats [git-log options]

Example

$ git user-stats
Email                           Commits         Files           Insertions      Deletions       Total Lines
-----                           -------         -----           ----------      ---------       -----------
john.smith@gmail.com            289             35              5361            3293            8654
joe.dirt@yahoo.com              142             17              2631            1756            4387
jack.bauer@fbi.gov              115             9               1407            1107            2514
$ git -C path/to/repo user-stats --since="1 week ago"
Email                           Commits         Files           Insertions      Deletions       Total Lines
-----                           -------         -----           ----------      ---------       -----------
joe.dirt@yahoo.com              20              3               83              634             717
john.smith@gmail.com            21              2               242             110             352

Installation

Download the script, give it executable permissions, and stick it somewhere in your path. e.g.:

wget -O ~/bin/git-user-stats https://gist.githubusercontent.com/shitchell/783cc8a892ed1591eca2afeb65e8720a/raw/git-user-stats
chmod +x ~/bin/git-user-stats
cd ~/path/to/repo
git user-stats --since="1 week ago"

Explanation

Basically it uses git log --format="author: %ae" --numstat (minus any empty lines or binary files) to generate output that looks like:

author: bob.smith@gmail.com
1       147     foo/bar.py
0       370     hello/world.py
author: john.smith@aol.com
7       6       foo/bar.py
author: jack.bauer@fbi.gov
1       0       super/sekrit.txt
author: john.smith@aol.com
2       1       hello/world.py

Each section that starts with author: ... is a single commit. The first column of --numstat is the number of insertions, and the second column is the number of deletions for that file.

It then walks over each line with awk. Whenever it hits a line that starts with author:, it stores the 2nd column of that line (the email address of the author for that particular commit) in the variable author and increments that user's total number of commits. For each subsequent line, it updates the number of insertions, deletions, and files for that user until it hits the next line that starts with author:. Rinse and repeat until it's done.

At the end, it sorts by the total line changes (insertions + deletions) and prints out all of the collected stats. If you wanted to sort by something else, you would simply replace the total array with the relevant array in the asorti(...) function. e.g., to sort by number of files, you would change that line to:

n = asorti(files, sorted_emails, "@val_num_desc");

note the function allows for passing custom git log arguments :D

A little more detail

The git log output is run through:

  • tr '[A-Z]' '[a-z]' to normalize email addresses. My company capitalizes email addresses a la John.Smith@TheCompany.com, and depending on where / how a user is making their commit, that email might show up capitalized or all lowercase. This ensures that all instances of a particular email address are always grouped together regardless of capitalization.
  • grep -v '^$' to remove empty lines that show up by default in the log output
  • grep -v '^-' to remove the --numstat info for binary files, which looks like:
    - - foo/bar.png
#!/bin/bash
#
# Show user stats (commits, files modified, insertions, deletions, and total
# lines modified) for a repo
git_log_opts=( "$@" )
git log "${git_log_opts[@]}" --format='author: %ae' --numstat \
| tr '[A-Z]' '[a-z]' \
| grep -v '^$' \
| grep -v '^-' \
| awk '
{
if ($1 == "author:") {
author = $2;
commits[author]++;
} else {
insertions[author] += $1;
deletions[author] += $2;
total[author] += $1 + $2;
# if this is the first time seeing this file for this
# author, increment their file count
author_file = author ":" $3;
if (!(author_file in seen)) {
seen[author_file] = 1;
files[author]++;
}
}
}
END {
# Print a header
printf("%-30s\t%-10s\t%-10s\t%-10s\t%-10s\t%-10s\n",
"Email", "Commits", "Files",
"Insertions", "Deletions", "Total Lines");
printf("%-30s\t%-10s\t%-10s\t%-10s\t%-10s\t%-10s\n",
"-----", "-------", "-----",
"----------", "---------", "-----------");
# Print the stats for each user, sorted by total lines
n = asorti(total, sorted_emails, "@val_num_desc");
for (i = 1; i <= n; i++) {
email = sorted_emails[i];
printf("%-30s\t%-10s\t%-10s\t%-10s\t%-10s\t%-10s\n",
email, commits[email], files[email],
insertions[email], deletions[email], total[email]);
}
}
'
@smartm13
Copy link

smartm13 commented Nov 4, 2022

Just for the record, this was originally posted at https://stackoverflow.com/a/73781404/7562633
It was so legendary that I answered 2 questions to increase my reputation, ONLY to upvote this!

@pavel-rossinsky
Copy link

Hi @shitchell. When I'm running the bash script on Mac or Ubuntu, the error occurs:

./git-user-stats --since="1 week ago"
awk: line 38: function asorti never defined

It would be great to make the script portable.

@ahatstat
Copy link

ahatstat commented Jan 4, 2023

Hi @shitchell. When I'm running the bash script on Mac or Ubuntu, the error occurs:

./git-user-stats --since="1 week ago"
awk: line 38: function asorti never defined

It would be great to make the script portable.

asorti is a gawk extension. Ubuntu uses mawk by default for awk. To make the script work on Ubuntu:

sudo apt-get install gawk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment