Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@varemenos
Last active April 21, 2024 23:21
Show Gist options
  • Save varemenos/e95c2e098e657c7688fd to your computer and use it in GitHub Desktop.
Save varemenos/e95c2e098e657c7688fd to your computer and use it in GitHub Desktop.
Git log in JSON format

Get Git log in JSON format

git log --pretty=format:'{%n  "commit": "%H",%n  "abbreviated_commit": "%h",%n  "tree": "%T",%n  "abbreviated_tree": "%t",%n  "parent": "%P",%n  "abbreviated_parent": "%p",%n  "refs": "%D",%n  "encoding": "%e",%n  "subject": "%s",%n  "sanitized_subject_line": "%f",%n  "body": "%b",%n  "commit_notes": "%N",%n  "verification_flag": "%G?",%n  "signer": "%GS",%n  "signer_key": "%GK",%n  "author": {%n    "name": "%aN",%n    "email": "%aE",%n    "date": "%aD"%n  },%n  "commiter": {%n    "name": "%cN",%n    "email": "%cE",%n    "date": "%cD"%n  }%n},'

The only information that aren't fetched are:

  • %B: raw body (unwrapped subject and body)
  • %GG: raw verification message from GPG for a signed commit

The format is applied to each line, so once you get all the lines, you need to remove the trailing , and wrap them around an Array.

git log pretty format source: http://git-scm.com/docs/pretty-formats

Here is an example in Javascript based on a package I'm working on for Atom:

var format = '{%n  "commit": "%H",%n  "abbreviated_commit": "%h",%n  "tree": "%T",%n  "abbreviated_tree": "%t",%n  "parent": "%P",%n  "abbreviated_parent": "%p",%n  "refs": "%D",%n  "encoding": "%e",%n  "subject": "%s",%n  "sanitized_subject_line": "%f",%n  "body": "%b",%n  "commit_notes": "%N",%n  "verification_flag": "%G?",%n  "signer": "%GS",%n  "signer_key": "%GK",%n  "author": {%n    "name": "%aN",%n    "email": "%aE",%n    "date": "%aD"%n  },%n  "commiter": {%n    "name": "%cN",%n    "email": "%cE",%n    "date": "%cD"%n  }%n},';

var commits = [];

new BufferedProcess({
    command: 'git',
    args: [
        'log',
        '--pretty=format:' + format
    ],
    stdout: function (chunk) { commits += chunk },
    exit: function (code) {
        if (code === 0) {
            var result = JSON.parse('[' + commits.slice(0, -1) + ']');

            console.log(result); // valid JSON array
        }
    }
});
@daenney
Copy link

daenney commented Jul 12, 2015

Pretty neat. Breaks rather quickly once you pipe it into jq though but that's expected since this doesn't do any form of escaping for double quotes and other things.

@plrthink
Copy link

Oh, this could be used to do some visualization stuff.

@aclewis182
Copy link

This just made my live MUCH easier. I've been looking for tools to do this, never dawned on me to use the formatting to create a valid JSON.

@dreamyguy
Copy link

dreamyguy commented May 22, 2016

A very interesting javascript take you took there, I didn't consider it before! 🏆

For me it was very important to parse the commit stats as well, but the most straight-forward way to output stats (through --shortstat) is cumbersome as it gets printed on a new line. As some commits don't have any stats, the new line pattern breaks, making it difficult to port the data reliably.

It was also important for me to output the git log of many repositories into a single JSON, so the script got a bit more complex than originally planned, as I had to deal with some error handling. After many failed attempts I got it sorted with the help of two scripts, one to spit out the commits output in one line each, and another to parse the chewed output to JSON.

It's all here https://github.com/dreamyguy/gitlogg

Some of Gitlogg's features are:

  • Parse the git log of multiple repositories into one JSON file. ✨
  • Introduced repository key/value.
  • Introduced files changed, insertions and deletions keys/values.
  • Introduced impact key/value, which represents the cumulative changes for the commit (insertions - deletions).
  • Sanitise double quotes " by converting them to single quotes ' on values that allow user input, like subject.
  • Nearly all the pretty=format: placeholders are featured.
  • Easily include / exclude which keys/values will be parsed to JSON by commenting out/uncommenting the available ones.
  • Thoroughly commented code.
  • Script execution feedback on console.
  • Error handling (since path to repositories needs to be set correctly).

@adrianlzt
Copy link

Making a valid JSON:

  • removing comma from last line
  • changes \r\n (newline) to \n
  • add [ and ] to make an JSON array
git log --pretty=format:'{%n  "commit": "%H",%n  "abbreviated_commit": "%h",%n  "tree": "%T",%n  "abbreviated_tree": "%t",%n  "parent": "%P",%n  "abbreviated_parent": "%p",%n  "refs": "%D",%n  "encoding": "%e",%n  "subject": "%s",%n  "sanitized_subject_line": "%f",%n  "body": "%b",%n  "commit_notes": "%N",%n  "verification_flag": "%G?",%n  "signer": "%GS",%n  "signer_key": "%GK",%n  "author": {%n    "name": "%aN",%n    "email": "%aE",%n    "date": "%aD"%n  },%n  "commiter": {%n    "name": "%cN",%n    "email": "%cE",%n    "date": "%cD"%n  }%n},' | sed "$ s/,$//" | sed ':a;N;$!ba;s/\r\n\([^{]\)/\\n\1/g'| awk 'BEGIN { print("[") } { print($0) } END { print("]") }'

@ORESoftware
Copy link

ORESoftware commented Dec 19, 2017

what is BufferedProcess? an NPM package?

I am having trouble parsing the JSON that comes back from the command

@ORESoftware
Copy link

@sergey-shpak
Copy link

@varemenos hey, I have different approach on this
https://gist.github.com/sergey-shpak/40fe8d2534c5e5941b9db9e28132ca0b
it is more stable for special chars (like double quotes etc),
your feedback is highly appreciated

@maxschremser
Copy link

Fucking amazing hack!

👍

@cekstam
Copy link

cekstam commented Jan 31, 2019

If you need something that can package multiline commit messages into json; https://gist.github.com/cekstam/a7758b8f315835d479f379715eebd0c3

@alegag
Copy link

alegag commented Mar 7, 2019

For those having problems with double-quotes escaping, a little hack:

git log --pretty=format:'{^^^^date^^^^:^^^^%ci^^^^,^^^^abbreviated_commit^^^^:^^^^%h^^^^,^^^^subject^^^^:^^^^%s^^^^,^^^^body^^^^:^^^^%b^^^^},' | sed 's/"/\\"/g' | sed 's/\^^^^/"/g'

So the idea is to first make the "json" with ^^^^instead of double-quotes (seems to be iimprobable that a log-message has this pattern). Then, with SED, first escape quotes (would be from subjet/body at this point), then replace those ^^^^ to quotes, giving a valid JSON with escape quotes.

@anandsuresh
Copy link

@alegag That is awesome!!! Just the thing I was looking for!

@machsix
Copy link

machsix commented Mar 31, 2019

For those who encountered the problem of escaping the double quote with child_process.execSync, I created the following one
https://gist.github.com/machsix/45fd59bc7d5a16931b534dd8c38be93f
It's more generic.

@mrVanDalo
Copy link

mrVanDalo commented Jul 20, 2019

I wrote a small python script which solved all issues I had (unicode, newlines, ... ) . I'm not a python programmer though, but maybe that helps other non-python-people to understand it better :D
https://gist.github.com/mrVanDalo/6a1d1aed4bd613fbdf1fa751fca47c6a

@jjxtra
Copy link

jjxtra commented Feb 24, 2020

How would you add additions/deletions to the json?

@nsisodiya
Copy link

nsisodiya commented Apr 18, 2020

I am getting new line in strings
image

Solved using adding

sed ':a;N;$!ba;s/\n/ /g'

@NGenetzky
Copy link

I finally posted my oneliner as a gist that uses jq along with double-quote hack to create a relatively robust solution.

Git log as JSON array is attached below


README.md

git log \
  --pretty=format:'{^^^^date^^^^:^^^^%ci^^^^,^^^^abbreviated_commit^^^^:^^^^%h^^^^,^^^^subject^^^^:^^^^%s^^^^,^^^^body^^^^:^^^^%b^^^^}' \
  | sed 's/"/\\"/g' \
  | sed 's/\^^^^/"/g' \
  | jq -s '.'

The format is applied to each line to create a json object per line. Then jq --slurp is used to "slurp" up the objects to create a valid json array object.

git log pretty format source: http://git-scm.com/docs/pretty-formats

Credit:


git_log_as_json_array.sh

#!/bin/sh

# [Git log as JSON array](https://gist.github.com/NGenetzky/46621b35d60fed8f036311c6f1637e48)
git log --pretty=format:'{^^^^date^^^^:^^^^%ci^^^^,^^^^abbreviated_commit^^^^:^^^^%h^^^^,^^^^subject^^^^:^^^^%s^^^^,^^^^body^^^^:^^^^%b^^^^}' | sed 's/"/\\"/g' | sed 's/\^^^^/"/g' | jq -s '.'

@patrickdevivo
Copy link

gitqlite "SELECT * FROM commits" --format json using gitqlite is also a possibility, is a bit more succinct (but of course requires installing another CLI). It would be quite nice if one of the pretty formats git log supported was json...

(Full disclosure I'm a creator/maintainer of gitqlite)

@nsisodiya
Copy link

I tried all sort of command, but most of them are failing. finally merged and tried many combination and finally this is working.

git log --pretty=format:'{%n  ^^^^commit^^^^: ^^^^%H^^^^,%n  ^^^^abbreviated_commit^^^^: ^^^^%h^^^^,%n  ^^^^tree^^^^: ^^^^%T^^^^,%n  ^^^^abbreviated_tree^^^^: ^^^^%t^^^^,%n  ^^^^parent^^^^: ^^^^%P^^^^,%n  ^^^^abbreviated_parent^^^^: ^^^^%p^^^^,%n  ^^^^refs^^^^: ^^^^%D^^^^,%n  ^^^^encoding^^^^: ^^^^%e^^^^,%n  ^^^^subject^^^^: ^^^^%s^^^^,%n  ^^^^sanitized_subject_line^^^^: ^^^^%f^^^^,%n  ^^^^commit_notes^^^^: ^^^^%N^^^^,%n  ^^^^verification_flag^^^^: ^^^^%G?^^^^,%n  ^^^^signer^^^^: ^^^^%GS^^^^,%n  ^^^^signer_key^^^^: ^^^^%GK^^^^,%n  ^^^^author^^^^: {%n    ^^^^name^^^^: ^^^^%aN^^^^,%n    ^^^^email^^^^: ^^^^%aE^^^^,%n    ^^^^date^^^^: ^^^^%aD^^^^%n  },%n  ^^^^commiter^^^^: {%n    ^^^^name^^^^: ^^^^%cN^^^^,%n    ^^^^email^^^^: ^^^^%cE^^^^,%n    ^^^^date^^^^: ^^^^%cD^^^^%n  }%n},' | sed 's/"/\\"/g' | sed 's/\^^^^/"/g' | sed "$ s/,$//" | sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g'  | awk 'BEGIN { print("[") } { print($0) } END { print("]") }' > git-log.json

Hope it will save time of many developers.

@terezka
Copy link

terezka commented Jan 8, 2021

This blob was just what I was looking for- thank you!!

@yaronaharony
Copy link

Thanks you!!!
small question :
is there a way to get the plus/minus numbers as well (changed , inserted, deleted) for each of the entries ?

@panta82
Copy link

panta82 commented Aug 22, 2021

@nsisodiya commiter -> committer

@overengineer
Copy link

overengineer commented Sep 30, 2021

What about this: https://gist.github.com/overengineer/b69e578f5cf7457dc7d4ff8c3b7850bc
Tested with jq on https://github.com/tj/git-extras repository with 1605 commits. No crashes 👍

Sanitizes commit messages and author names.
(escape newline, escape quotes, escape backslash, escape tabs, fix double escapes, delete invalid characters)
I encountered null characters, control characters, quotes in git log. Finally sanitized everything (I hope).

Edit: I tested it on Blender repo. Well, it has some failing edge cases.

@april-dbx
Copy link

@nsisodiya's recommendation above is the only thing that truly worked for me. Thanks so much, saving this to a bash function for the rest of my life. :)

@nsisodiya
Copy link

@april-dbx Most Welcome.

@april
Copy link

april commented Nov 23, 2022

Don't mean to add another to the pile, but here we go: https://gist.github.com/april/ee2e104b1435f3113e67663d8875bbef

This builds upon what @nsisodiya, @varemenos, and @overengineer put together, but with:

  • made into a convenient shell function that can be run against current directory or file/folder
  • uses traditional JSON camelCase
  • includes every major field that git log can output, including the body
  • proper sections for author, committer, and signature
  • multiple date formats (one for reading, ISO8601 for parsing)
  • should properly handle (most? all?) body values, even those that contain newlines, tabs, quotation marks and escaped characters
  • outputs as minimized JSON, can be piped to jq for pretty printing: git-log-json | jq -r '.[] | .subject'

I tested it against a git repository with over a million commits without issues, but there certainly might be some.

@nsisodiya
Copy link

@april - Wow, that is awesome.

@H-zk
Copy link

H-zk commented Mar 28, 2023

finally, I got the JSON like this without BufferedProcess,but I use #'# as special key to replace and transform the double quote "

const path = require("path")
const fs = require("fs")
const { execSync } = require('child_process');

const startCommitHash = 'f9f457bca14228bad1e00c55d903a3c9f3738fc8'
const endCommitHash = 'HEAD'

// get log rawText
const logTxts = execSync(
  `git log --pretty=format:"{%n  #'#commit#'#: #'#%H#'#,%n  #'#abbreviated_commit#'#: #'#%h#'#,%n  #'#tree#'#: #'#%T#'#,%n  #'#abbreviated_tree#'#: #'#%t#'#,%n  #'#parent#'#: #'#%P#'#,%n  #'#abbreviated_parent#'#: #'#%p#'#,%n  #'#refs#'#: #'#%D#'#,%n  #'#encoding#'#: #'#%e#'#,%n  #'#subject#'#: #'#%s#'#,%n  #'#sanitized_subject_line#'#: #'#%f#'#,%n  #'#body#'#: #'#%b#'#,%n  #'#commit_notes#'#: #'#%N#'#,%n  #'#verification_flag#'#: #'#%G?#'#,%n  #'#signer#'#: #'#%GS#'#,%n  #'#signer_key#'#: #'#%GK#'#,%n  #'#author#'#: {%n    #'#name#'#: #'#%aN#'#,%n    #'#email#'#: #'#%aE#'#,%n    #'#date#'#: #'#%aD#'#%n  },%n  #'#commiter#'#: {%n    #'#name#'#: #'#%cN#'#,%n    #'#email#'#: #'#%cE#'#,%n    #'#date#'#: #'#%cD#'#%n  }%n}," ${startCommitHash}..${endCommitHash}`,
  { encoding: 'utf8' },
).toString();

// transform json
const logJson = `[${logTxts.slice(0, -1).replace(/"/g, "'").replace(/#'#/g, '"')}]`;

fs.writeFileSync(path.join(__dirname, 'log.json'), logJson);

// parse json and do what you need
const subjectLines = JSON.parse(logJson).map(item => item["sanitized_subject_line"])
console.info('subjectLines', subjectLines)

@gribok
Copy link

gribok commented Oct 25, 2023

Following commit message breakes each logic in this thread:

Example Commit Message:
Added logic, but it doesn't look nice yet \_o_/

@nsisodiya Any hints?

Validated by jq:

$ bash utilities/git-log-json.sh  | jq
parse error: Invalid escape at line 2, column 22785

@floriankraemer
Copy link

floriankraemer commented Apr 20, 2024

Here is another idea, written in PHP. Put it in a php file in the root of a repository and execute it. It will iterate over each single placeholder for all commits separately.

Yes, I know that this is inefficient, but this allows it to get each single value separate without parsing a giant string in which we need to consider all possible cases of something breaking our parser. So the trade off here is to either get good values slower or try to get a probably never perfect parsing solution just to get all of the values in one loop. A language that supports parallel execution could do it probably faster. For a repository with 13205 commits it runs just a few seconds on my machine, generating ~15mb of JSON. I run this on a NVME SSD.

❗ This is just a quick draft, feel free to provide critic or improve it. 😃

<?php

// https://git-scm.com/docs/pretty-formats/2.21.0

$placeholders = [
    'H' => 'hash',
    'h' => 'abbreviated_hash',
    'P' => 'parent_hash',
    'p' => 'abbreviated_parent_hash',
    'T' => 'tree_hash',
    't' => 'abbreviated_tree_hash',
    'an' => 'author_name',
    'ae' => 'author_email',
    'aD' => 'author_date',
    'at' => 'author_unix_timestamp',
    'cn' => 'committer_name',
    'ce' => 'committer_email',
    'cD' => 'committer_date',
    'ct' => 'committer_unix_timestamp',
    's' => 'subject',
    'b' => 'body',
    'B' => 'raw_body',
    'N' => 'notes',
    'D' => 'branch',
    'd' => 'commit_tag',
    'gD' => 'reflog_selector',
    'gs' => 'reflog_subject',
    'gn' => 'reflog_name',
    'e' => 'encoding',
    'f' => 'sanitized_subject_line',
];

$commits = [];

foreach ($placeholders as $placeholder => $name) {
    $gitCommand = 'git log --format=\'%H2>>>>> %' . $placeholder . '\'';

    $output = shell_exec($gitCommand);
    $lines = explode("\n", $output);
    foreach ($lines as $line) {
        if (preg_match('/^.*>>>>> .*$/', $line)) {
            $commitId = substr($line,0, 41);
        } else {
            $commits[$commitId][$name] .= $line;
            continue;
        }

        if (!isset($commits[$commitId])) {
            $commits[$commitId] = [];
        }

        $commits[$commitId][$name] = substr($line, 47);
    }
}

file_put_contents('commits.json', json_encode($commits, JSON_THROW_ON_ERROR | JSON_PRETTY_PRINT | JSON_INVALID_UTF8_SUBSTITUTE));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment