Skip to content

Instantly share code, notes, and snippets.

@varemenos
Last active April 21, 2024 23:21
Show Gist options
  • Save varemenos/e95c2e098e657c7688fd to your computer and use it in GitHub Desktop.
Save varemenos/e95c2e098e657c7688fd to your computer and use it in GitHub Desktop.
Git log in JSON format

Get Git log in JSON format

git log --pretty=format:'{%n  "commit": "%H",%n  "abbreviated_commit": "%h",%n  "tree": "%T",%n  "abbreviated_tree": "%t",%n  "parent": "%P",%n  "abbreviated_parent": "%p",%n  "refs": "%D",%n  "encoding": "%e",%n  "subject": "%s",%n  "sanitized_subject_line": "%f",%n  "body": "%b",%n  "commit_notes": "%N",%n  "verification_flag": "%G?",%n  "signer": "%GS",%n  "signer_key": "%GK",%n  "author": {%n    "name": "%aN",%n    "email": "%aE",%n    "date": "%aD"%n  },%n  "commiter": {%n    "name": "%cN",%n    "email": "%cE",%n    "date": "%cD"%n  }%n},'

The only information that aren't fetched are:

  • %B: raw body (unwrapped subject and body)
  • %GG: raw verification message from GPG for a signed commit

The format is applied to each line, so once you get all the lines, you need to remove the trailing , and wrap them around an Array.

git log pretty format source: http://git-scm.com/docs/pretty-formats

Here is an example in Javascript based on a package I'm working on for Atom:

var format = '{%n  "commit": "%H",%n  "abbreviated_commit": "%h",%n  "tree": "%T",%n  "abbreviated_tree": "%t",%n  "parent": "%P",%n  "abbreviated_parent": "%p",%n  "refs": "%D",%n  "encoding": "%e",%n  "subject": "%s",%n  "sanitized_subject_line": "%f",%n  "body": "%b",%n  "commit_notes": "%N",%n  "verification_flag": "%G?",%n  "signer": "%GS",%n  "signer_key": "%GK",%n  "author": {%n    "name": "%aN",%n    "email": "%aE",%n    "date": "%aD"%n  },%n  "commiter": {%n    "name": "%cN",%n    "email": "%cE",%n    "date": "%cD"%n  }%n},';

var commits = [];

new BufferedProcess({
    command: 'git',
    args: [
        'log',
        '--pretty=format:' + format
    ],
    stdout: function (chunk) { commits += chunk },
    exit: function (code) {
        if (code === 0) {
            var result = JSON.parse('[' + commits.slice(0, -1) + ']');

            console.log(result); // valid JSON array
        }
    }
});
@nsisodiya
Copy link

nsisodiya commented Apr 18, 2020

I am getting new line in strings
image

Solved using adding

sed ':a;N;$!ba;s/\n/ /g'

@NGenetzky
Copy link

I finally posted my oneliner as a gist that uses jq along with double-quote hack to create a relatively robust solution.

Git log as JSON array is attached below


README.md

git log \
  --pretty=format:'{^^^^date^^^^:^^^^%ci^^^^,^^^^abbreviated_commit^^^^:^^^^%h^^^^,^^^^subject^^^^:^^^^%s^^^^,^^^^body^^^^:^^^^%b^^^^}' \
  | sed 's/"/\\"/g' \
  | sed 's/\^^^^/"/g' \
  | jq -s '.'

The format is applied to each line to create a json object per line. Then jq --slurp is used to "slurp" up the objects to create a valid json array object.

git log pretty format source: http://git-scm.com/docs/pretty-formats

Credit:


git_log_as_json_array.sh

#!/bin/sh

# [Git log as JSON array](https://gist.github.com/NGenetzky/46621b35d60fed8f036311c6f1637e48)
git log --pretty=format:'{^^^^date^^^^:^^^^%ci^^^^,^^^^abbreviated_commit^^^^:^^^^%h^^^^,^^^^subject^^^^:^^^^%s^^^^,^^^^body^^^^:^^^^%b^^^^}' | sed 's/"/\\"/g' | sed 's/\^^^^/"/g' | jq -s '.'

@patrickdevivo
Copy link

gitqlite "SELECT * FROM commits" --format json using gitqlite is also a possibility, is a bit more succinct (but of course requires installing another CLI). It would be quite nice if one of the pretty formats git log supported was json...

(Full disclosure I'm a creator/maintainer of gitqlite)

@nsisodiya
Copy link

I tried all sort of command, but most of them are failing. finally merged and tried many combination and finally this is working.

git log --pretty=format:'{%n  ^^^^commit^^^^: ^^^^%H^^^^,%n  ^^^^abbreviated_commit^^^^: ^^^^%h^^^^,%n  ^^^^tree^^^^: ^^^^%T^^^^,%n  ^^^^abbreviated_tree^^^^: ^^^^%t^^^^,%n  ^^^^parent^^^^: ^^^^%P^^^^,%n  ^^^^abbreviated_parent^^^^: ^^^^%p^^^^,%n  ^^^^refs^^^^: ^^^^%D^^^^,%n  ^^^^encoding^^^^: ^^^^%e^^^^,%n  ^^^^subject^^^^: ^^^^%s^^^^,%n  ^^^^sanitized_subject_line^^^^: ^^^^%f^^^^,%n  ^^^^commit_notes^^^^: ^^^^%N^^^^,%n  ^^^^verification_flag^^^^: ^^^^%G?^^^^,%n  ^^^^signer^^^^: ^^^^%GS^^^^,%n  ^^^^signer_key^^^^: ^^^^%GK^^^^,%n  ^^^^author^^^^: {%n    ^^^^name^^^^: ^^^^%aN^^^^,%n    ^^^^email^^^^: ^^^^%aE^^^^,%n    ^^^^date^^^^: ^^^^%aD^^^^%n  },%n  ^^^^commiter^^^^: {%n    ^^^^name^^^^: ^^^^%cN^^^^,%n    ^^^^email^^^^: ^^^^%cE^^^^,%n    ^^^^date^^^^: ^^^^%cD^^^^%n  }%n},' | sed 's/"/\\"/g' | sed 's/\^^^^/"/g' | sed "$ s/,$//" | sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g'  | awk 'BEGIN { print("[") } { print($0) } END { print("]") }' > git-log.json

Hope it will save time of many developers.

@terezka
Copy link

terezka commented Jan 8, 2021

This blob was just what I was looking for- thank you!!

@yaronaharony
Copy link

Thanks you!!!
small question :
is there a way to get the plus/minus numbers as well (changed , inserted, deleted) for each of the entries ?

@panta82
Copy link

panta82 commented Aug 22, 2021

@nsisodiya commiter -> committer

@overengineer
Copy link

overengineer commented Sep 30, 2021

What about this: https://gist.github.com/overengineer/b69e578f5cf7457dc7d4ff8c3b7850bc
Tested with jq on https://github.com/tj/git-extras repository with 1605 commits. No crashes 👍

Sanitizes commit messages and author names.
(escape newline, escape quotes, escape backslash, escape tabs, fix double escapes, delete invalid characters)
I encountered null characters, control characters, quotes in git log. Finally sanitized everything (I hope).

Edit: I tested it on Blender repo. Well, it has some failing edge cases.

@april-dbx
Copy link

@nsisodiya's recommendation above is the only thing that truly worked for me. Thanks so much, saving this to a bash function for the rest of my life. :)

@nsisodiya
Copy link

@april-dbx Most Welcome.

@april
Copy link

april commented Nov 23, 2022

Don't mean to add another to the pile, but here we go: https://gist.github.com/april/ee2e104b1435f3113e67663d8875bbef

This builds upon what @nsisodiya, @varemenos, and @overengineer put together, but with:

  • made into a convenient shell function that can be run against current directory or file/folder
  • uses traditional JSON camelCase
  • includes every major field that git log can output, including the body
  • proper sections for author, committer, and signature
  • multiple date formats (one for reading, ISO8601 for parsing)
  • should properly handle (most? all?) body values, even those that contain newlines, tabs, quotation marks and escaped characters
  • outputs as minimized JSON, can be piped to jq for pretty printing: git-log-json | jq -r '.[] | .subject'

I tested it against a git repository with over a million commits without issues, but there certainly might be some.

@nsisodiya
Copy link

@april - Wow, that is awesome.

@H-zk
Copy link

H-zk commented Mar 28, 2023

finally, I got the JSON like this without BufferedProcess,but I use #'# as special key to replace and transform the double quote "

const path = require("path")
const fs = require("fs")
const { execSync } = require('child_process');

const startCommitHash = 'f9f457bca14228bad1e00c55d903a3c9f3738fc8'
const endCommitHash = 'HEAD'

// get log rawText
const logTxts = execSync(
  `git log --pretty=format:"{%n  #'#commit#'#: #'#%H#'#,%n  #'#abbreviated_commit#'#: #'#%h#'#,%n  #'#tree#'#: #'#%T#'#,%n  #'#abbreviated_tree#'#: #'#%t#'#,%n  #'#parent#'#: #'#%P#'#,%n  #'#abbreviated_parent#'#: #'#%p#'#,%n  #'#refs#'#: #'#%D#'#,%n  #'#encoding#'#: #'#%e#'#,%n  #'#subject#'#: #'#%s#'#,%n  #'#sanitized_subject_line#'#: #'#%f#'#,%n  #'#body#'#: #'#%b#'#,%n  #'#commit_notes#'#: #'#%N#'#,%n  #'#verification_flag#'#: #'#%G?#'#,%n  #'#signer#'#: #'#%GS#'#,%n  #'#signer_key#'#: #'#%GK#'#,%n  #'#author#'#: {%n    #'#name#'#: #'#%aN#'#,%n    #'#email#'#: #'#%aE#'#,%n    #'#date#'#: #'#%aD#'#%n  },%n  #'#commiter#'#: {%n    #'#name#'#: #'#%cN#'#,%n    #'#email#'#: #'#%cE#'#,%n    #'#date#'#: #'#%cD#'#%n  }%n}," ${startCommitHash}..${endCommitHash}`,
  { encoding: 'utf8' },
).toString();

// transform json
const logJson = `[${logTxts.slice(0, -1).replace(/"/g, "'").replace(/#'#/g, '"')}]`;

fs.writeFileSync(path.join(__dirname, 'log.json'), logJson);

// parse json and do what you need
const subjectLines = JSON.parse(logJson).map(item => item["sanitized_subject_line"])
console.info('subjectLines', subjectLines)

@gribok
Copy link

gribok commented Oct 25, 2023

Following commit message breakes each logic in this thread:

Example Commit Message:
Added logic, but it doesn't look nice yet \_o_/

@nsisodiya Any hints?

Validated by jq:

$ bash utilities/git-log-json.sh  | jq
parse error: Invalid escape at line 2, column 22785

@floriankraemer
Copy link

floriankraemer commented Apr 20, 2024

Here is another idea, written in PHP. Put it in a php file in the root of a repository and execute it. It will iterate over each single placeholder for all commits separately.

Yes, I know that this is inefficient, but this allows it to get each single value separate without parsing a giant string in which we need to consider all possible cases of something breaking our parser. So the trade off here is to either get good values slower or try to get a probably never perfect parsing solution just to get all of the values in one loop. A language that supports parallel execution could do it probably faster. For a repository with 13205 commits it runs just a few seconds on my machine, generating ~15mb of JSON. I run this on a NVME SSD.

❗ This is just a quick draft, feel free to provide critic or improve it. 😃

<?php

// https://git-scm.com/docs/pretty-formats/2.21.0

$placeholders = [
    'H' => 'hash',
    'h' => 'abbreviated_hash',
    'P' => 'parent_hash',
    'p' => 'abbreviated_parent_hash',
    'T' => 'tree_hash',
    't' => 'abbreviated_tree_hash',
    'an' => 'author_name',
    'ae' => 'author_email',
    'aD' => 'author_date',
    'at' => 'author_unix_timestamp',
    'cn' => 'committer_name',
    'ce' => 'committer_email',
    'cD' => 'committer_date',
    'ct' => 'committer_unix_timestamp',
    's' => 'subject',
    'b' => 'body',
    'B' => 'raw_body',
    'N' => 'notes',
    'D' => 'branch',
    'd' => 'commit_tag',
    'gD' => 'reflog_selector',
    'gs' => 'reflog_subject',
    'gn' => 'reflog_name',
    'e' => 'encoding',
    'f' => 'sanitized_subject_line',
];

$commits = [];

foreach ($placeholders as $placeholder => $name) {
    $gitCommand = 'git log --format=\'%H2>>>>> %' . $placeholder . '\'';

    $output = shell_exec($gitCommand);
    $lines = explode("\n", $output);
    foreach ($lines as $line) {
        if (preg_match('/^.*>>>>> .*$/', $line)) {
            $commitId = substr($line,0, 41);
        } else {
            $commits[$commitId][$name] .= $line;
            continue;
        }

        if (!isset($commits[$commitId])) {
            $commits[$commitId] = [];
        }

        $commits[$commitId][$name] = substr($line, 47);
    }
}

file_put_contents('commits.json', json_encode($commits, JSON_THROW_ON_ERROR | JSON_PRETTY_PRINT | JSON_INVALID_UTF8_SUBSTITUTE));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment