Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Count number of code lines in git repository per user
git ls-files -z | xargs -0n1 git blame -w | perl -n -e '/^.*\((.*?)\s*[\d]{4}/; print $1,"\n"' | sort -f | uniq -c | sort -n
@ZuBB

This comment has been minimized.

Copy link

@ZuBB ZuBB commented Aug 7, 2016

this one does not need perl

git ls-files | xargs -n1 git blame --line-porcelain | sed -n 's/^author //p' | sort -f | uniq -ic | sort -nr

git it here http://www.commandlinefu.com/commands/view/3889/prints-per-line-contribution-per-author-for-a-git-repository

@kwantam

This comment has been minimized.

Copy link

@kwantam kwantam commented Aug 25, 2016

You can replace sed with grep and avoid xargs entirely (Bourne-ish shell; tested in dash and bash):

git ls-files | while read f; do git blame --line-porcelain $f | grep '^author '; done | sort -f | uniq -ic | sort -n
@zapadinsky

This comment has been minimized.

Copy link

@zapadinsky zapadinsky commented Dec 18, 2017

For the aim of authorship statistics analysis it also makes sense to ignore white spaces and lines moved between files in blame command:
git ls-files | while read f; do git blame -w -M -C -C --line-porcelain $f | grep '^author '; done | sort -f | uniq -ic | sort -n
*note that twice -C -C is important here

@harlanhaskins

This comment has been minimized.

Copy link

@harlanhaskins harlanhaskins commented Jan 23, 2018

git ls-files | while read f; do git blame -w -M -C -C --line-porcelain "$f" | grep '^author '; done | sort -f | uniq -ic | sort -n

Make sure to quote $f, otherwise this'll break on paths with spaces in them.

@rgson

This comment has been minimized.

Copy link

@rgson rgson commented Feb 13, 2018

To get rid of the occassional "Binary file (standard input) matches" line, add the -I option to grep:

git ls-files | while read f; do git blame -w -M -C -C --line-porcelain "$f" | grep -I '^author '; done | sort -f | uniq -ic | sort -n
@johndevedu

This comment has been minimized.

Copy link

@johndevedu johndevedu commented Mar 8, 2019

To list the stats on a single file (and I like the order in reverse):

git blame -w -M -C -C --line-porcelain FILENAMEGOESHERE | grep -I '^author ' | sort -f | uniq -ic | sort -nr

@endafarrell

This comment has been minimized.

Copy link

@endafarrell endafarrell commented Jun 7, 2019

From time to time, a user's name may change (casing, first-last vs last-first etc) and in some limited cases, sorting based not on the name but on the email may be more indiciative:

git ls-files | \
while read f; do \
    git blame -w -M -C -C --line-porcelain "$f" | \
   grep -I '^author-mail '; \
done | cut -f2 -d'<' | cut -f1 -d'>' | sort -f | uniq -ic | sort -n

The change is to look not for the "^author " but the "^author-email " with more cuts to make the output easier to use later.

@ezamelczyk

This comment has been minimized.

Copy link

@ezamelczyk ezamelczyk commented Jun 14, 2019

None of these solutions really worked for me so I made my own. Tested on MacOS and working. You can try it here.

@MattyMead

This comment has been minimized.

Copy link

@MattyMead MattyMead commented Oct 30, 2019

I've used this a few times now and works perfectly although I'm wondering how to go about refactoring it so that you can query changes between a given date range? Any help would be much appreciated!
git ls-files | while read f; do git blame -w -M -C -C --line-porcelain "$f" | grep -I '^author '; done | sort -f | uniq -ic | sort -n
Is the version I'm using. Nice to see everyone improving on the previous commands in each comment!

@YarekTyshchenko

This comment has been minimized.

Copy link

@YarekTyshchenko YarekTyshchenko commented Dec 29, 2019

Another iteration:

git ls-files | while read f; do git blame -w --line-porcelain -- "$f" | grep -I '^author '; done | sort -f | uniq -ic | sort -n
  • Counts current state of the repository rather than all commits in the past. (no need for -M or -C -C)
  • Avoids the Binary file (standard input) matches message
  • Works on OSX, required a -- to separate the filename
@yoderj

This comment has been minimized.

Copy link

@yoderj yoderj commented May 26, 2020

Another variation: First, export author=somebody, then:
git ls-files | while read f; do git blame -w --line-porcelain -- "$f" | grep -I '^author ' | sed s_^_"$f"" "_; done | grep "$author" | awk '{print $1}' | sort -f | uniq -ic | sort -n
For a given $username, report number of lines per file. Useful for exploring in more depth unusual cases.
Line numbers are a nice supplement to counting number of commits, but ALL must be taken with a grain of salt.

The first one won't work if there are underscores in the name. This one will:
cat tmp | while read f; do replaceEscaped=$(sed 's/[&/\]/\\&/g' <<<"$f"); git blame -w --line-porcelain -- "$f" | grep -I '^author ' | sed s/^/"$replaceEscaped"" "/; done | grep "$author" | awk '{print $1}' | sort -f | uniq -ic | sort -n

Source for the sed magic: https://stackoverflow.com/a/29613573/1048186
However, this one is reaching the limit of what could even remotely be considered one line!

@remorses

This comment has been minimized.

Copy link

@remorses remorses commented Sep 2, 2020

I made a cli to make this process easier, it shows a file tree with all the corresponding code owners

https://github.com/remorses/codebase-owners

@SureshotM6

This comment has been minimized.

Copy link

@SureshotM6 SureshotM6 commented Oct 16, 2020

Here's a variation on the earlier responses that parallelizes the blame. This can result in a significant speedup if you have multiple cores. This version also supports filenames that may be quoted by 'git ls-files' (tabs, newlines, backslashes, quotes, UTF-8, etc.) or that begin with a "-":

git ls-files -z |
   xargs -0rn 1 -P "$(nproc)" -I{} sh -c 'git blame -w -M -C -C --line-porcelain -- {} | grep -I --line-buffered "^author "' |
   sort -f |
   uniq -ic |
   sort -n
@AzazBasha

This comment has been minimized.

Copy link

@AzazBasha AzazBasha commented Oct 27, 2021

Hello,

Is there any way to count month wise data like from Jan - Mar how many number of code lines in git repository per user?

@eigan

This comment has been minimized.

Copy link

@eigan eigan commented Nov 11, 2021

The other commands here took hours for our project. Here is a faster method:
  1. Remember to use blame.ignoreRevsFile to ignore mass-edits (like code style fixes).
  2. Use `git ls-files -x "*pdf" -x "*xml"`` to filter out files.
git ls-files | while read i; do git blame $i | sed -e 's/^[^(]*(//' -e 's/^\([^[:digit:]]*\)[[:space:]]\+[[:digit:]].*/\1/' -e 's/[[:blank:]]*$//'; done | sort -f | uniq -ic | sort -rn

Counting only activity last two years:

git ls-files | while read i; do git blame $i --since 2.years | grep -v '^\^' | sed -e 's/^[^(]*(//' -e 's/^\([^[:digit:]]*\)[[:space:]]\+[[:digit:]].*/\1/' -e 's/[[:blank:]]*$//'; done | sort -f | uniq -ic | sort -rn

Solution modified from: https://stackoverflow.com/a/2788077

@HoffiMuc

This comment has been minimized.

Copy link

@HoffiMuc HoffiMuc commented Nov 12, 2021

here's my one-liner:

function gitfilecontributors() { local perfile="false" ; if [[ $1 = "-f" ]]; then perfile="true" ; shift ; fi ; if [[ $# -eq 0 ]]; then echo "no files given!" >&2 ; return 1 ; else local f ; { for f in "$@"; do echo "$f" ; git blame --show-email "$f" | sed -nE 's/^[^ ]* *.<([^>]*)>.*$/: \1/p' | sort | uniq -c | sort -r -nk1 ; done } | if [[ "$perfile" = "true" ]]; then tee /tmp/gitblamestats.txt ; else tee /tmp/gitblamestats.txt >/dev/null ; fi ; echo ; echo "total:" ; awk -v FS=' *: *' '/^ *[0-9]/{sums[$2] += $1} END { for (i in sums) printf("%7s : %s\n", sums[i], i)}' /tmp/gitblamestats.txt | sort -r -nk1 ; fi ; }

or with line breaks:

gitfilecontributors ()
{
    local perfile="false";
    if [[ $1 = "-f" ]]; then
        perfile="true";
        shift;
    fi;
    if [[ $# -eq 0 ]]; then
        echo "no files given!" 1>&2;
        return 1;
    else
        local f;
        {
            for f in "$@";
            do
                echo "$f";
                git blame --show-email "$f" | sed -nE 's/^[^ ]* *.<([^>]*)>.*$/: \1/p' | sort | uniq -c | sort -r -nk1;
            done
        } | if [[ "$perfile" = "true" ]]; then
            tee /tmp/gitblamestats.txt;
        else
            tee /tmp/gitblamestats.txt > /dev/null;
        fi;
        echo;
        echo "total:";
        awk -v FS=' *: *' '/^ *[0-9]/{sums[$2] += $1} END { for (i in sums) printf("%7s : %s\n", sums[i], i)}' /tmp/gitblamestats.txt | sort -r -nk1;
    fi
}

usage possible four folder(s) of your choice.

option -f to show per file, otherwise totals only:

$ gitfilecontributors    $(fd --type f '.*' source)
total:
    139 : somebody@somewhere.de
     29 : else.user@somewhere.de
      9 : just.another@somewhere.de
gitfilecontributors -f $(fd --type f '.*' source)
source/040_InitialSetup.md
     80 : somebody@somewhere.de
     29 : else.user@somewhere.de
      6 : just.another@somewhere.de
README.md
     59 : somebody@somewhere.de
      5 : whosthat@somewhere.de
      3 : just.another@somewhere.de

total:
    139 : somebody@somewhere.de
     29 : else.user@somewhere.de
      9 : just.another@somewhere.de
      5 : whosthat@somewhere.de
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment