Skip to content

Instantly share code, notes, and snippets.

@ofou
Last active February 7, 2024 23:54
Show Gist options
  • Save ofou/31d752cbc385c29358f0c4677d3ddc9f to your computer and use it in GitHub Desktop.
Save ofou/31d752cbc385c29358f0c4677d3ddc9f to your computer and use it in GitHub Desktop.
Git Commit History Exporter with Detailed Diffs in JSONL Format
git log --reverse --pretty=format:'%H' | while read commit_hash;
do
  # Extract required commit information
  commit_author=$(git show -s --format='%an' $commit_hash 2>/dev/null)
  if [ -z "$commit_author" ]; then
    echo "Skipping invalid commit hash: $commit_hash"
    continue
  fi
  
  commit_author_email=$(git show -s --format='%ae' $commit_hash)
  commit_date=$(git show -s --format='%cI' $commit_hash)  # ISO 8601 format
  commit_title=$(git show -s --format='%s' $commit_hash | sed 's/"/\\"/g')  # First line of commit message as title

  # Get diffs and encode in base64
  diffs=$(git diff $commit_hash^ $commit_hash 2>/dev/null | base64 | tr -d '\n')  # Remove newlines after encoding
  if [ -z "$diffs" ]; then
    echo "No diffs found for commit: $commit_hash"
    continue
  fi

  # Construct JSON object
  json_object="{\"hash\": \"$commit_hash\", \"title\": \"$commit_title\", \"date\": \"$commit_date\", \"author\": \"$commit_author\", \"mail\": \"$commit_author_email\", \"diffs\": \"$diffs\"}"

  # Append the JSON object to a file
  echo $json_object >> changes.jsonl
done

This script version encodes the diffs in base64, allowing you to include the entire diff without needing to escape newlines. This method keeps the JSON Lines format valid since the encoded diff is a single-line string.

To Decode the Diff: To view or process the diffs after extracting them from the JSON, you'll need to decode them from base64. You can do this with command-line tools or programmatically in most programming languages. For example, using base64 command-line tool:

echo "encoded_diff" | base64 --decode

Replace "encoded_diff" with the actual base64-encoded diff string you've extracted from your JSONL file.

Note: This method increases the size of the diff data in the JSONL file due to base64 encoding overhead. However, it ensures that newlines and other special characters in diffs are preserved without breaking the JSON Lines format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment