Skip to content

Instantly share code, notes, and snippets.

@johnnymo87
Last active May 11, 2024 21:14
Show Gist options
  • Save johnnymo87/4701b6671730768ba95f19a5ee29a177 to your computer and use it in GitHub Desktop.
Save johnnymo87/4701b6671730768ba95f19a5ee29a177 to your computer and use it in GitHub Desktop.
Concatenates code files from a directory and its subdirectories into a single output file.
#!/usr/bin/env bash
: <<'END'
Script Name: code_concatenator.bash
Purpose:
This script is designed to concatenate all code files within a specified directory
and its subdirectories into a single output file. The output file will contain the
file paths and contents of each code file, separated by a delimiter (```) This
script is particularly useful for preparing code files for analysis or processing
by other tools or services that require a single file input.
Usage:
./code_concatenator.bash <CODE_DIR>
Arguments:
- CODE_DIR: The directory containing the code files to be concatenated.
Features:
- Respects the .gitignore file if the current working directory is a Git repository.
- Recursively traverses the specified directory and its subdirectories.
- Supports various file extensions (e.g., .js, .py, .java, .cpp, etc.).
- Handles files with unbalanced backticks or other special characters.
- Outputs the concatenated file contents to the console (can be redirected to a file).
Background:
The script uses a combination of Bash built-in commands and utilities to achieve
its functionality. It checks if the current working directory is a Git repository
and uses the `git ls-files` command to list files, respecting the .gitignore file.
If the current working directory is not a Git repository, the script falls back to
the recursive traversal method. The script handles potential issues with the
use of the ``` delimiter appearing in the source code by using a custom
intermediate delimiter (###) before adjusting it to ``` at the very end.
Dependencies:
- Bash: The script is written for Bash shell environments found in Linux and macOS
systems.
- Git: The script requires Git to be installed if the current working directory is
a Git repository.
Note:
While this script is designed to handle a wide range of code files, it may not work
as expected for files with extremely large sizes or specific encoding issues. It's
recommended to review the output and adjust the script as needed for your specific
use case.
Note:
This script was written with the assistance of the "claude-3-opus-20240229" model
developed by Anthropic.
Author: Jonathan Mohrbacher (github.com/johnnymo87)
Date: 2024-04-13
END
set -euo pipefail
# Function to concatenate files
concatenate_files() {
local dir="$1"
local output_file="$2"
local repo_root
# Check if the current working directory is a Git repository
if git rev-parse --is-inside-work-tree >/dev/null 2>&1; then
# Get the root directory of the Git repository
repo_root=$(git rev-parse --show-toplevel)
# Use git ls-files to list files, respecting .gitignore
git ls-files -- "$dir" | while read -r file; do
file_path="$repo_root/$file"
printf -v file_path_output "%s\n" "$file_path"
printf -v delimiter "###\n"
printf "%s" "$file_path_output" >> "$output_file"
printf "%s" "$delimiter" >> "$output_file"
cat "$file_path" >> "$output_file" 2>/dev/null
printf "%s\n" "$delimiter" >> "$output_file"
done
else
# Traverse the directory recursively
for file in "$dir"/*; do
if [ -d "$file" ]; then
concatenate_files "$file" "$output_file"
elif [ -f "$file" ]; then
file_path="$file"
printf -v file_path_output "%s\n" "$file_path"
printf -v delimiter "###\n"
printf "%s" "$file_path_output" >> "$output_file"
printf "%s" "$delimiter" >> "$output_file"
cat "$file" >> "$output_file" 2>/dev/null
printf "%s\n" "$delimiter" >> "$output_file"
fi
done
fi
}
# Check if a directory is provided
if [ -z "$1" ]; then
echo "Usage: $0 <CODE_DIR>"
exit 1
fi
# Create a temporary file for output
output_file=$(mktemp)
# Call the recursive function
concatenate_files "$1" "$output_file"
# Print the contents of the output file, replacing ### with ```.
cat "$output_file" | LC_ALL=C sed 's/###$/```/g'
# Clean up the temporary file
rm "$output_file"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment