Skip to content

Instantly share code, notes, and snippets.

@mostafa-drz
Created February 5, 2025 03:00
Show Gist options
  • Save mostafa-drz/6bc8bbcb48e3181d3b8e49392b594780 to your computer and use it in GitHub Desktop.
Save mostafa-drz/6bc8bbcb48e3181d3b8e49392b594780 to your computer and use it in GitHub Desktop.
Bash script to extract repository context for AI tools like ChatGPT. Collects README, CHANGELOG, Git history, file structure (respecting .gitignore), dependency files, and additional files via a file map.
#!/bin/bash
# extract_repo_info.sh
# ================================================
# Description:
# This script extracts key repository details to provide context
# for AI tools like ChatGPT about the project I'm working on.
# It collects information such as README, CHANGELOG, Git commit
# history, directory structures, dependency files, and additional
# highlighted files specified in a fileMap JSON file.
#
# Use this script to quickly refresh context when interacting
# with AI assistants, making it easier to maintain continuity.
#
# Usage:
# ./extract_repo_info.sh [--mainDirectory <dir>] [--maxLength <number_of_lines>] [--fileMap <json_file>]
#
# Flags:
# -m, --mainDirectory Specify a main directory (e.g., src) to show its file structure.
# -l, --maxLength Limit the number of lines output from text files (e.g., README, CHANGELOG).
# -f, --fileMap Provide a JSON file mapping with structure:
# [
# {
# "path": "filePath",
# "description": "The description for the file"
# }
# ]
#
# Ensure the script has execute permissions: chmod +x extract_repo_info.sh
# ================================================
# -------------------------
# Parse command-line flags
# -------------------------
MAIN_DIRECTORY=""
MAX_LINES=0 # 0 means no limit
FILE_MAP_FILE=""
while [[ $# -gt 0 ]]; do
key="$1"
case $key in
-m|--mainDirectory)
MAIN_DIRECTORY="$2"
shift # past flag
shift # past value
;;
-l|--maxLength)
MAX_LINES="$2"
shift
shift
;;
-f|--fileMap)
FILE_MAP_FILE="$2"
shift
shift
;;
*)
# unknown option
shift
;;
esac
done
# -------------------------
# Set up IGNORED_DIRS for file structure display.
# Initially ignoring common directories.
IGNORED_DIRS=".git|node_modules|dist|build"
# Extend IGNORED_DIRS with entries from .gitignore, if present.
if [ -f ".gitignore" ]; then
# Exclude comments and empty lines, then join patterns with '|'
extra=$(grep -vE '^\s*(#|$)' .gitignore | tr '\n' '|' | sed 's/|$//')
if [ -n "$extra" ]; then
IGNORED_DIRS="$IGNORED_DIRS|$extra"
fi
fi
# -------------------------
# Function: display_file
# Displays the file content (limited by MAX_LINES if set) with a header.
# -------------------------
display_file() {
local file="$1"
local header="$2"
echo "===== $header: $file ====="
if [ -f "$file" ]; then
if [ "$MAX_LINES" -gt 0 ]; then
head -n "$MAX_LINES" "$file"
echo "... (limited to $MAX_LINES lines)"
else
cat "$file"
fi
echo -e "\n"
else
echo "$file not found"
echo ""
fi
}
# -------------------------
# 1. Display README file (check common names).
# -------------------------
if [ -f "README.md" ]; then
display_file "README.md" "Project Overview (README.md)"
elif [ -f "README.txt" ]; then
display_file "README.txt" "Project Overview (README.txt)"
elif [ -f "README" ]; then
display_file "README" "Project Overview (README)"
else
echo "No README file found."
fi
# -------------------------
# 2. Display CHANGELOG file.
# -------------------------
if [ -f "CHANGELOG.md" ]; then
display_file "CHANGELOG.md" "Changelog (CHANGELOG.md)"
elif [ -f "CHANGELOG" ]; then
display_file "CHANGELOG" "Changelog (CHANGELOG)"
else
echo "No CHANGELOG file found."
fi
# -------------------------
# 3. Display Git commit history (latest 10 commits).
# -------------------------
echo "===== Git Commit History (Latest 10 commits) ====="
if git rev-parse --is-inside-work-tree > /dev/null 2>&1; then
git log --oneline -n 10 || echo "Error retrieving git log."
else
echo "Not a Git repository."
fi
echo ""
# -------------------------
# 4. Display Directory Structure for the main directory (if provided).
# -------------------------
if [ -n "$MAIN_DIRECTORY" ]; then
echo "===== File Structure for Main Directory: $MAIN_DIRECTORY ====="
if [ -d "$MAIN_DIRECTORY" ]; then
if command -v tree >/dev/null 2>&1; then
# Use tree with ignored directories (-I)
tree -L 2 -I "$IGNORED_DIRS" "$MAIN_DIRECTORY" || echo "Error running tree command."
else
# Fallback: list directories and files using find, ignoring common directories.
find "$MAIN_DIRECTORY" -maxdepth 2 \( -path "$MAIN_DIRECTORY/.git" -o -path "$MAIN_DIRECTORY/node_modules" -o -path "$MAIN_DIRECTORY/dist" -o -path "$MAIN_DIRECTORY/build" \) -prune -o -print | sort || echo "Error running find command."
fi
else
echo "Directory '$MAIN_DIRECTORY' not found."
fi
echo ""
fi
# -------------------------
# 5. Display Directory Structure for the repository root.
# -------------------------
echo "===== Repository Root File Structure (Depth: 2) ====="
if command -v tree >/dev/null 2>&1; then
tree -L 2 -I "$IGNORED_DIRS" . || echo "Error running tree command."
else
# Fallback: list files and directories using find, ignoring specified directories.
find . -maxdepth 2 \( -path "./.git" -o -path "./node_modules" -o -path "./dist" -o -path "./build" \) -prune -o -print | sort || echo "Error running find command."
fi
echo ""
# -------------------------
# 6. Display Dependency/Configuration Files.
# -------------------------
echo "===== Dependency/Configuration Files ====="
display_file "package.json" "Node.js Dependencies (package.json)"
display_file "requirements.txt" "Python Dependencies (requirements.txt)"
# -------------------------
# 7. Process File Map for Highlighted Files (if provided)
# -------------------------
if [ -n "$FILE_MAP_FILE" ]; then
echo "===== Highlighted Files from File Map ====="
if [ ! -f "$FILE_MAP_FILE" ]; then
echo "File map '$FILE_MAP_FILE' not found."
else
# Check if jq is available for JSON parsing.
if ! command -v jq &> /dev/null; then
echo "Error: 'jq' is required to parse the file map. Please install jq."
else
# Read the JSON array and iterate over its objects.
num_items=$(jq length "$FILE_MAP_FILE")
for (( i=0; i<num_items; i++ )); do
file_path=$(jq -r ".[$i].path" "$FILE_MAP_FILE")
description=$(jq -r ".[$i].description" "$FILE_MAP_FILE")
echo "----- $description: $file_path -----"
if [ -f "$file_path" ]; then
if [ "$MAX_LINES" -gt 0 ]; then
head -n "$MAX_LINES" "$file_path"
echo "... (limited to $MAX_LINES lines)"
else
cat "$file_path"
fi
else
echo "File '$file_path' not found."
fi
echo ""
done
fi
fi
fi
echo "Repository extraction complete."
@mostafa-drz
Copy link
Author

mostafa-drz commented Feb 5, 2025

Repository Extraction Script

This script was created to extract key repository context for AI tools like ChatGPT. It gathers important details such as:

  • README/CHANGELOG content: Provides an overview of your project.
  • Latest Git commit history: Shows a snapshot of recent changes.
  • File structure: Displays the layout of both your main directory (if specified) and the repository root, ignoring files and directories listed in .gitignore.
  • Dependency/Configuration files: Includes files like package.json, requirements.txt, etc.
  • Additional files via fileMap: Lets you highlight specific files with custom descriptions.

Why I Created This Script

I built this script to help me quickly refresh and share project context with AI tools when discussing or debugging my code. With token limits and long context histories, it became essential to extract and summarize key points of a repository. This script ensures that ChatGPT (and similar tools) gets the necessary background information to provide relevant assistance.

How I Use It

Before engaging with AI assistants, I run this script to produce a structured summary of my repository. This summary includes everything from the project overview to specific configuration files, making it easier for the AI to understand the context without overwhelming it with unnecessary details.

Example Project

For an example of how this script works in a real-world scenario, check out the Art Wise AI project. This project demonstrates the kind of detailed context I aim to provide.

The fileMap Feature

The fileMap option allows you to specify additional files that are critical to your project. Use it if you have important files that aren’t captured by the standard extraction. The JSON file should follow this format:

[
  {
    "path": "path/to/important/file.ext",
    "description": "Brief explanation of why this file is important."
  }
]

Usage Examples

Here are some example commands to help you get started with the extract_repo_info.sh script:

# 1. Extract basic repository context
# This command gathers README, CHANGELOG, Git history, and root file structure.
./extract_repo_info.sh
# 2. Focus on a specific directory
# This command focuses on the './app' directory to display its structure while still including general project context.
./extract_repo_info.sh -m './app'
# 3. Highlight additional important files
# This command highlights specific files listed in 'forAIFileMap.json' with custom descriptions.
./extract_repo_info.sh -f './forAIFileMap.json'
# 4. Extract full context and copy to clipboard (macOS)
# This command extracts full context, including the './app' structure and highlighted files, then copies the output to the clipboard.
./extract_repo_info.sh -m './app' -f './forAIFileMap.json' | pbcopy

Example of forAIFileMap.json

The forAIFileMap.json file allows you to highlight specific files that are important for providing additional context to AI tools. Here’s an example of how it might look:

[
  {
    "path": "tailwind.config.ts",
    "description": "Tailwind CSS configuration file, important for understanding the project's design system."
  },
  {
    "path": "app/context/GlobalState.tsx",
    "description": "Handles the global state management logic of the app."
  },
  {
    "path": "aiSummaryHighlights.txt",
    "description": "A text file with project highlights, goals, and key context for AI tools."
  }
]

Using aiSummaryHighlights.txt for Project Insights (or something similar)

Consider maintaining a simple text file, like aiSummaryHighlights.txt, to capture key insights, project goals, design decisions, or even challenges that aren’t directly reflected in the code. This can help AI tools understand the why behind your project, not just the how.

Example aiSummaryHighlights.txt:

# Project Goals
- Build an AI-powered art companion for art enthusiasts and educators.
- Focus on real-time voice interactions using WebRTC and OpenAI APIs.

# Key Considerations
- Ensure accessibility for multilingual users.
- Prioritize responsive design for mobile and desktop.

# Challenges to Solve
- Optimize audio generation latency for better user experience.
- Improve state management in the chat component to handle edge cases.

# Future Ideas
- Add AR features for interactive art galleries.
- Integrate blockchain to verify artwork authenticity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment