Skip to content

Instantly share code, notes, and snippets.

Last active April 21, 2024 23:21
Show Gist options
  • Save varemenos/e95c2e098e657c7688fd to your computer and use it in GitHub Desktop.
Save varemenos/e95c2e098e657c7688fd to your computer and use it in GitHub Desktop.
Git log in JSON format

Get Git log in JSON format

git log --pretty=format:'{%n  "commit": "%H",%n  "abbreviated_commit": "%h",%n  "tree": "%T",%n  "abbreviated_tree": "%t",%n  "parent": "%P",%n  "abbreviated_parent": "%p",%n  "refs": "%D",%n  "encoding": "%e",%n  "subject": "%s",%n  "sanitized_subject_line": "%f",%n  "body": "%b",%n  "commit_notes": "%N",%n  "verification_flag": "%G?",%n  "signer": "%GS",%n  "signer_key": "%GK",%n  "author": {%n    "name": "%aN",%n    "email": "%aE",%n    "date": "%aD"%n  },%n  "commiter": {%n    "name": "%cN",%n    "email": "%cE",%n    "date": "%cD"%n  }%n},'

The only information that aren't fetched are:

  • %B: raw body (unwrapped subject and body)
  • %GG: raw verification message from GPG for a signed commit

The format is applied to each line, so once you get all the lines, you need to remove the trailing , and wrap them around an Array.

git log pretty format source:

Here is an example in Javascript based on a package I'm working on for Atom:

var format = '{%n  "commit": "%H",%n  "abbreviated_commit": "%h",%n  "tree": "%T",%n  "abbreviated_tree": "%t",%n  "parent": "%P",%n  "abbreviated_parent": "%p",%n  "refs": "%D",%n  "encoding": "%e",%n  "subject": "%s",%n  "sanitized_subject_line": "%f",%n  "body": "%b",%n  "commit_notes": "%N",%n  "verification_flag": "%G?",%n  "signer": "%GS",%n  "signer_key": "%GK",%n  "author": {%n    "name": "%aN",%n    "email": "%aE",%n    "date": "%aD"%n  },%n  "commiter": {%n    "name": "%cN",%n    "email": "%cE",%n    "date": "%cD"%n  }%n},';

var commits = [];

new BufferedProcess({
    command: 'git',
    args: [
        '--pretty=format:' + format
    stdout: function (chunk) { commits += chunk },
    exit: function (code) {
        if (code === 0) {
            var result = JSON.parse('[' + commits.slice(0, -1) + ']');

            console.log(result); // valid JSON array
Copy link

daenney commented Jul 12, 2015

Pretty neat. Breaks rather quickly once you pipe it into jq though but that's expected since this doesn't do any form of escaping for double quotes and other things.

Copy link

Oh, this could be used to do some visualization stuff.

Copy link

This just made my live MUCH easier. I've been looking for tools to do this, never dawned on me to use the formatting to create a valid JSON.

Copy link

dreamyguy commented May 22, 2016

A very interesting javascript take you took there, I didn't consider it before! 🏆

For me it was very important to parse the commit stats as well, but the most straight-forward way to output stats (through --shortstat) is cumbersome as it gets printed on a new line. As some commits don't have any stats, the new line pattern breaks, making it difficult to port the data reliably.

It was also important for me to output the git log of many repositories into a single JSON, so the script got a bit more complex than originally planned, as I had to deal with some error handling. After many failed attempts I got it sorted with the help of two scripts, one to spit out the commits output in one line each, and another to parse the chewed output to JSON.

It's all here

Some of Gitlogg's features are:

  • Parse the git log of multiple repositories into one JSON file. ✨
  • Introduced repository key/value.
  • Introduced files changed, insertions and deletions keys/values.
  • Introduced impact key/value, which represents the cumulative changes for the commit (insertions - deletions).
  • Sanitise double quotes " by converting them to single quotes ' on values that allow user input, like subject.
  • Nearly all the pretty=format: placeholders are featured.
  • Easily include / exclude which keys/values will be parsed to JSON by commenting out/uncommenting the available ones.
  • Thoroughly commented code.
  • Script execution feedback on console.
  • Error handling (since path to repositories needs to be set correctly).

Copy link

Making a valid JSON:

  • removing comma from last line
  • changes \r\n (newline) to \n
  • add [ and ] to make an JSON array
git log --pretty=format:'{%n  "commit": "%H",%n  "abbreviated_commit": "%h",%n  "tree": "%T",%n  "abbreviated_tree": "%t",%n  "parent": "%P",%n  "abbreviated_parent": "%p",%n  "refs": "%D",%n  "encoding": "%e",%n  "subject": "%s",%n  "sanitized_subject_line": "%f",%n  "body": "%b",%n  "commit_notes": "%N",%n  "verification_flag": "%G?",%n  "signer": "%GS",%n  "signer_key": "%GK",%n  "author": {%n    "name": "%aN",%n    "email": "%aE",%n    "date": "%aD"%n  },%n  "commiter": {%n    "name": "%cN",%n    "email": "%cE",%n    "date": "%cD"%n  }%n},' | sed "$ s/,$//" | sed ':a;N;$!ba;s/\r\n\([^{]\)/\\n\1/g'| awk 'BEGIN { print("[") } { print($0) } END { print("]") }'

Copy link

ORESoftware commented Dec 19, 2017

what is BufferedProcess? an NPM package?

I am having trouble parsing the JSON that comes back from the command

Copy link

Copy link

@varemenos hey, I have different approach on this
it is more stable for special chars (like double quotes etc),
your feedback is highly appreciated

Copy link

Fucking amazing hack!


Copy link

cekstam commented Jan 31, 2019

If you need something that can package multiline commit messages into json;

Copy link

alegag commented Mar 7, 2019

For those having problems with double-quotes escaping, a little hack:

git log --pretty=format:'{^^^^date^^^^:^^^^%ci^^^^,^^^^abbreviated_commit^^^^:^^^^%h^^^^,^^^^subject^^^^:^^^^%s^^^^,^^^^body^^^^:^^^^%b^^^^},' | sed 's/"/\\"/g' | sed 's/\^^^^/"/g'

So the idea is to first make the "json" with ^^^^instead of double-quotes (seems to be iimprobable that a log-message has this pattern). Then, with SED, first escape quotes (would be from subjet/body at this point), then replace those ^^^^ to quotes, giving a valid JSON with escape quotes.

Copy link

@alegag That is awesome!!! Just the thing I was looking for!

Copy link

machsix commented Mar 31, 2019

For those who encountered the problem of escaping the double quote with child_process.execSync, I created the following one
It's more generic.

Copy link

mrVanDalo commented Jul 20, 2019

I wrote a small python script which solved all issues I had (unicode, newlines, ... ) . I'm not a python programmer though, but maybe that helps other non-python-people to understand it better :D

Copy link

jjxtra commented Feb 24, 2020

How would you add additions/deletions to the json?

Copy link

nsisodiya commented Apr 18, 2020

I am getting new line in strings

Solved using adding

sed ':a;N;$!ba;s/\n/ /g'

Copy link

I finally posted my oneliner as a gist that uses jq along with double-quote hack to create a relatively robust solution.

Git log as JSON array is attached below

git log \
  --pretty=format:'{^^^^date^^^^:^^^^%ci^^^^,^^^^abbreviated_commit^^^^:^^^^%h^^^^,^^^^subject^^^^:^^^^%s^^^^,^^^^body^^^^:^^^^%b^^^^}' \
  | sed 's/"/\\"/g' \
  | sed 's/\^^^^/"/g' \
  | jq -s '.'

The format is applied to each line to create a json object per line. Then jq --slurp is used to "slurp" up the objects to create a valid json array object.

git log pretty format source:



# [Git log as JSON array](
git log --pretty=format:'{^^^^date^^^^:^^^^%ci^^^^,^^^^abbreviated_commit^^^^:^^^^%h^^^^,^^^^subject^^^^:^^^^%s^^^^,^^^^body^^^^:^^^^%b^^^^}' | sed 's/"/\\"/g' | sed 's/\^^^^/"/g' | jq -s '.'

Copy link

gitqlite "SELECT * FROM commits" --format json using gitqlite is also a possibility, is a bit more succinct (but of course requires installing another CLI). It would be quite nice if one of the pretty formats git log supported was json...

(Full disclosure I'm a creator/maintainer of gitqlite)

Copy link

I tried all sort of command, but most of them are failing. finally merged and tried many combination and finally this is working.

git log --pretty=format:'{%n  ^^^^commit^^^^: ^^^^%H^^^^,%n  ^^^^abbreviated_commit^^^^: ^^^^%h^^^^,%n  ^^^^tree^^^^: ^^^^%T^^^^,%n  ^^^^abbreviated_tree^^^^: ^^^^%t^^^^,%n  ^^^^parent^^^^: ^^^^%P^^^^,%n  ^^^^abbreviated_parent^^^^: ^^^^%p^^^^,%n  ^^^^refs^^^^: ^^^^%D^^^^,%n  ^^^^encoding^^^^: ^^^^%e^^^^,%n  ^^^^subject^^^^: ^^^^%s^^^^,%n  ^^^^sanitized_subject_line^^^^: ^^^^%f^^^^,%n  ^^^^commit_notes^^^^: ^^^^%N^^^^,%n  ^^^^verification_flag^^^^: ^^^^%G?^^^^,%n  ^^^^signer^^^^: ^^^^%GS^^^^,%n  ^^^^signer_key^^^^: ^^^^%GK^^^^,%n  ^^^^author^^^^: {%n    ^^^^name^^^^: ^^^^%aN^^^^,%n    ^^^^email^^^^: ^^^^%aE^^^^,%n    ^^^^date^^^^: ^^^^%aD^^^^%n  },%n  ^^^^commiter^^^^: {%n    ^^^^name^^^^: ^^^^%cN^^^^,%n    ^^^^email^^^^: ^^^^%cE^^^^,%n    ^^^^date^^^^: ^^^^%cD^^^^%n  }%n},' | sed 's/"/\\"/g' | sed 's/\^^^^/"/g' | sed "$ s/,$//" | sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g'  | awk 'BEGIN { print("[") } { print($0) } END { print("]") }' > git-log.json

Hope it will save time of many developers.

Copy link

terezka commented Jan 8, 2021

This blob was just what I was looking for- thank you!!

Copy link

Thanks you!!!
small question :
is there a way to get the plus/minus numbers as well (changed , inserted, deleted) for each of the entries ?

Copy link

panta82 commented Aug 22, 2021

@nsisodiya commiter -> committer

Copy link

overengineer commented Sep 30, 2021

What about this:
Tested with jq on repository with 1605 commits. No crashes 👍

Sanitizes commit messages and author names.
(escape newline, escape quotes, escape backslash, escape tabs, fix double escapes, delete invalid characters)
I encountered null characters, control characters, quotes in git log. Finally sanitized everything (I hope).

Edit: I tested it on Blender repo. Well, it has some failing edge cases.

Copy link

@nsisodiya's recommendation above is the only thing that truly worked for me. Thanks so much, saving this to a bash function for the rest of my life. :)

Copy link

@april-dbx Most Welcome.

Copy link

april commented Nov 23, 2022

Don't mean to add another to the pile, but here we go:

This builds upon what @nsisodiya, @varemenos, and @overengineer put together, but with:

  • made into a convenient shell function that can be run against current directory or file/folder
  • uses traditional JSON camelCase
  • includes every major field that git log can output, including the body
  • proper sections for author, committer, and signature
  • multiple date formats (one for reading, ISO8601 for parsing)
  • should properly handle (most? all?) body values, even those that contain newlines, tabs, quotation marks and escaped characters
  • outputs as minimized JSON, can be piped to jq for pretty printing: git-log-json | jq -r '.[] | .subject'

I tested it against a git repository with over a million commits without issues, but there certainly might be some.

Copy link

@april - Wow, that is awesome.

Copy link

H-zk commented Mar 28, 2023

finally, I got the JSON like this without BufferedProcess,but I use #'# as special key to replace and transform the double quote "

const path = require("path")
const fs = require("fs")
const { execSync } = require('child_process');

const startCommitHash = 'f9f457bca14228bad1e00c55d903a3c9f3738fc8'
const endCommitHash = 'HEAD'

// get log rawText
const logTxts = execSync(
  `git log --pretty=format:"{%n  #'#commit#'#: #'#%H#'#,%n  #'#abbreviated_commit#'#: #'#%h#'#,%n  #'#tree#'#: #'#%T#'#,%n  #'#abbreviated_tree#'#: #'#%t#'#,%n  #'#parent#'#: #'#%P#'#,%n  #'#abbreviated_parent#'#: #'#%p#'#,%n  #'#refs#'#: #'#%D#'#,%n  #'#encoding#'#: #'#%e#'#,%n  #'#subject#'#: #'#%s#'#,%n  #'#sanitized_subject_line#'#: #'#%f#'#,%n  #'#body#'#: #'#%b#'#,%n  #'#commit_notes#'#: #'#%N#'#,%n  #'#verification_flag#'#: #'#%G?#'#,%n  #'#signer#'#: #'#%GS#'#,%n  #'#signer_key#'#: #'#%GK#'#,%n  #'#author#'#: {%n    #'#name#'#: #'#%aN#'#,%n    #'#email#'#: #'#%aE#'#,%n    #'#date#'#: #'#%aD#'#%n  },%n  #'#commiter#'#: {%n    #'#name#'#: #'#%cN#'#,%n    #'#email#'#: #'#%cE#'#,%n    #'#date#'#: #'#%cD#'#%n  }%n}," ${startCommitHash}..${endCommitHash}`,
  { encoding: 'utf8' },

// transform json
const logJson = `[${logTxts.slice(0, -1).replace(/"/g, "'").replace(/#'#/g, '"')}]`;

fs.writeFileSync(path.join(__dirname, 'log.json'), logJson);

// parse json and do what you need
const subjectLines = JSON.parse(logJson).map(item => item["sanitized_subject_line"])'subjectLines', subjectLines)

Copy link

gribok commented Oct 25, 2023

Following commit message breakes each logic in this thread:

Example Commit Message:
Added logic, but it doesn't look nice yet \_o_/

@nsisodiya Any hints?

Validated by jq:

$ bash utilities/  | jq
parse error: Invalid escape at line 2, column 22785

Copy link

floriankraemer commented Apr 20, 2024

Here is another idea, written in PHP. Put it in a php file in the root of a repository and execute it. It will iterate over each single placeholder for all commits separately.

Yes, I know that this is inefficient, but this allows it to get each single value separate without parsing a giant string in which we need to consider all possible cases of something breaking our parser. So the trade off here is to either get good values slower or try to get a probably never perfect parsing solution just to get all of the values in one loop. A language that supports parallel execution could do it probably faster. For a repository with 13205 commits it runs just a few seconds on my machine, generating ~15mb of JSON. I run this on a NVME SSD.

❗ This is just a quick draft, feel free to provide critic or improve it. 😃



$placeholders = [
    'H' => 'hash',
    'h' => 'abbreviated_hash',
    'P' => 'parent_hash',
    'p' => 'abbreviated_parent_hash',
    'T' => 'tree_hash',
    't' => 'abbreviated_tree_hash',
    'an' => 'author_name',
    'ae' => 'author_email',
    'aD' => 'author_date',
    'at' => 'author_unix_timestamp',
    'cn' => 'committer_name',
    'ce' => 'committer_email',
    'cD' => 'committer_date',
    'ct' => 'committer_unix_timestamp',
    's' => 'subject',
    'b' => 'body',
    'B' => 'raw_body',
    'N' => 'notes',
    'D' => 'branch',
    'd' => 'commit_tag',
    'gD' => 'reflog_selector',
    'gs' => 'reflog_subject',
    'gn' => 'reflog_name',
    'e' => 'encoding',
    'f' => 'sanitized_subject_line',

$commits = [];

foreach ($placeholders as $placeholder => $name) {
    $gitCommand = 'git log --format=\'%H2>>>>> %' . $placeholder . '\'';

    $output = shell_exec($gitCommand);
    $lines = explode("\n", $output);
    foreach ($lines as $line) {
        if (preg_match('/^.*>>>>> .*$/', $line)) {
            $commitId = substr($line,0, 41);
        } else {
            $commits[$commitId][$name] .= $line;

        if (!isset($commits[$commitId])) {
            $commits[$commitId] = [];

        $commits[$commitId][$name] = substr($line, 47);

file_put_contents('commits.json', json_encode($commits, JSON_THROW_ON_ERROR | JSON_PRETTY_PRINT | JSON_INVALID_UTF8_SUBSTITUTE));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment