Skip to content

Instantly share code, notes, and snippets.

@textarcana
Last active July 24, 2024 16:30
Show Gist options
  • Save textarcana/1306223 to your computer and use it in GitHub Desktop.
Save textarcana/1306223 to your computer and use it in GitHub Desktop.
Convert Git logs to JSON. The first script (git-log2json.sh) is all you need, the other two files contain only optional bonus features 😀THIS GIST NOW HAS A FULL GIT REPO: https://github.com/context-driven-testing-toolkit/git-log2json
#!/usr/bin/env bash
# Use this one-liner to produce a JSON literal from the Git log:
git log \
--pretty=format:'{%n "commit": "%H",%n "author": "%aN <%aE>",%n "date": "%ad",%n "message": "%f"%n},' \
$@ | \
perl -pe 'BEGIN{print "["}; END{print "]\n"}' | \
perl -pe 's/},]/}]/'
#!/usr/bin/env bash
# OPTIONAL: use this stand-alone shell script to produce a JSON object
# with information similar to git --stat.
#
# You can then easily cross-reference or merge this with the JSON git
# log, since both are keyed on the commit hash, which is unique.
git log \
--numstat \
--format='%H' \
$@ | \
perl -lawne '
if (defined $F[1]) {
print qq#{"insertions": "$F[0]", "deletions": "$F[1]", "path": "$F[2]"},#
} elsif (defined $F[0]) {
print qq#],\n"$F[0]": [#
};
END{print qq#],#}' | \
tail -n +2 | \
perl -wpe 'BEGIN{print "{"}; END{print "}"}' | \
tr '\n' ' ' | \
perl -wpe 's#(]|}),\s*(]|})#$1$2#g' | \
perl -wpe 's#,\s*?}$#}#'
/*
* OPTIONAL: use this Node.js expression to merge the data structures
* created by the two shell scripts above
*/
var gitLog, lstat;
gitLog = require('git-log.json');
lstat = require('git-stat.json');
gitLog.map(function(o){
o.paths = lstat[o.commit];
});
#!/usr/bin/env bash
# OPTIONAL: Use jq to merge the two JSON files.
jq --slurp '.[1] as $logstat | .[0] | map(.paths = $logstat[.commit])' git-log.json git-stat.json
@stephencmorton
Copy link

I achieved a good level of double quotes escaping by doing the following

git --no-pager log     --pretty=format:'{%n  111555commit666222: 111555%H666222,%n  111555author666222: 111555%an <%ae>666222,%n  111555date666222: 111555%ad666222,%n  111555message666222: 111555%s %n %b666222},'     $@ | sed 's/"/\\"/g' |  sed 's/111555/"/g' | sed 's/666222/"/g' | perl -pe 'BEGIN{print "["}; END{print "]\n"}' | perl -pe 's/},]/}]/' 

It's not super pretty but it escaped the double quotes in the commit messages.

I'm not sure why different special character sequences are used for "opening double quote" and "closing double quote". I don't think that provides any benefit. Just use 111555 (or whatever) for both.

(I know the original post I'm commenting on is very old, but it and my comment on it are both still relevant today.)

@l0b0
Copy link

l0b0 commented Jul 24, 2024

You can use jq itself to escape everything:

while IFS=$'\t' read -d '' -r -u 3 commit_hash author_date subject body; do
    jq --null-input \
        --arg commit_hash "$commit_hash" \
        --arg author_date "$author_date" \
        --arg subject "$subject" \
        --arg body "$body" \
        '{
            "commit_hash":(if $commit_hash == "" then null else $commit_hash end),
            "author_date":(if $author_date == "" then null else $author_date end),
            "subject":(if $subject == "" then null else $subject end),
            "body":(if $body == "" then null else $body end),
        }'
done 3< <(git log --expand-tabs --pretty='tformat:%H%x09%aI%x09%s%x09%b' -z)

It's a bit clunky, but should handle any weird formatting. Basically, format the log using NUL-terminated lines with tab column separators, then pass that into jq for escaping, treating all empty strings as null.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment