Skip to content

Instantly share code, notes, and snippets.

@robstryker
Created April 26, 2017 19:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save robstryker/4854fc86ab3714a5e1af353b98cbc768 to your computer and use it in GitHub Desktop.
Save robstryker/4854fc86ab3714a5e1af353b98cbc768 to your computer and use it in GitHub Desktop.
#!/bin/sh
# make a new folder and put this file inside it.
# Then execute it.
#
# This script rewrites history for each of the four repositories,
# specifically, to make ALL commits of the repos believe they were
# always in a subfolder of the repository.
# Examples:
# - pom.xml from webtools.common repo is rewritten to believe it always
# lived inside webtools.common.all repo at webtools.common/pom.xml
# (ie rewrote all history so pom.xml always was webtools.common/pom.xml)
# - pom.xml from webtools.common.tests repo is rewritten to believe it always
# lived inside webtools.common.all repo at webtools.common.tests/pom.xml
# (ie rewrote all history so pom.xml always was webtools.common.tests/pom.xml)
# etc etc etc.
#
# This changes all hashes of all previous commits, tags, branches, etc
# but keeps them otherwise identical.
#
# The script then backs up these 4 rewritten repos to a temporary folder
# named 'intermediary' for use later in the script.
#
# The script then rebuilds all branches and tags for a merged repo.
# This is a bit of a challenge, because the different repos may share,
# or not share, a given tag or branch name. In the event two repos
# share a branch or tag id, we should merge those two together.
#
# It would be near impossible, short of going commit-by-commit,
# to re-create all 4 repos at the time of a given tag that exists
# only in one repo, so for such branches or tags, we keep just 1
# of the repos in that given tag.
#
# However, this would cause the root of the repo to change repeatedly
# from having subfolders to not having subfolders, and this would be
# disruptive or confusing to the user.
# Ex:
# - repo at master: 'ls' returns webtools.common webtools.common.fproj webtools.common.snippets webtools.common.tests
# - user does git checkout for branch name 'TEMP_M4'
# - 'ls' now returns 'plugins' only.
#
# This is a problem, because the user won't know (without inspection)
# which pre-merge repo this tag or branch existed in... basically,
# they won't know what they're browsing. The main directory structure
# with 4 subdirs should be maintained in all tags and branches ideally.
# So this script accomplishes that goal, at the expense of
# changing hashes, and rewriting history to make the files
# think they always lived in subfolders.
#
# I hope this isn't too much of a drawback. It will most likely
# invalidate a lot of PRs that haven't been accepted yet :(
# So that's definitely a problem.
#
# The script takes about a half hour +/- 10min to run.
START_TIME=`date +%s`
rm -rf workingDirectory
mkdir workingDirectory
cd workingDirectory
mkdir testoutput
mkdir intermediate
mkdir webtools.common.all
cd webtools.common.all
# TODO, do a fresh clone of the four repos
git clone http://git.eclipse.org/gitroot/webtools-common/webtools.common.fproj.git
git clone http://git.eclipse.org/gitroot/webtools-common/webtools.common.git
git clone http://git.eclipse.org/gitroot/webtools-common/webtools.common.snippets.git
git clone http://git.eclipse.org/gitroot/webtools-common/webtools.common.tests.git
# Build and rewrite history for webtools.common
cd webtools.common
echo ""
echo "Rewriting history for webtools.common so it thinks it was always in a subfolder"
git filter-branch --index-filter \
'git ls-files -s | sed "s-\t-&webtools.common/-" |
GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
git update-index --index-info &&
mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE' \
--tag-name-filter 'cat'\
-- --all
cd ../
# Build and rewrite history for webtools.common.fproj
cd webtools.common.fproj
echo ""
echo "Rewriting history for webtools.common.fproj so it thinks it was always in a subfolder"
git filter-branch --index-filter \
'git ls-files -s | sed "s-\t-&webtools.common.fproj/-" |
GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
git update-index --index-info &&
mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE' \
--tag-name-filter 'cat'\
-- --all
cd ../
# Build and rewrite history for webtools.common.tests
cd webtools.common.tests
echo ""
echo "Rewriting history for webtools.common.tests so it thinks it was always in a subfolder"
git filter-branch --index-filter \
'git ls-files -s | sed "s-\t-&webtools.common.tests/-" |
GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
git update-index --index-info &&
mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE' \
--tag-name-filter 'cat'\
-- --all
cd ../
# Build and rewrite history for webtools.common.snippets
cd webtools.common.snippets
echo ""
echo "Rewriting history for webtools.common.snippets so it thinks it was always in a subfolder"
git filter-branch --index-filter \
'git ls-files -s | sed "s-\t-&webtools.common.snippets/-" |
GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
git update-index --index-info &&
mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE' \
--tag-name-filter 'cat'\
-- --all
cd ../
# Back up all tags and keys
echo ""
echo "Back up tags and branches with rewritten history to use later"
cd webtools.common.snippets/
git for-each-ref > ../../testoutput/webtools.common.snippets.txt
cd ../webtools.common.tests/
git for-each-ref > ../../testoutput/webtools.common.tests.txt
cd ../webtools.common.fproj/
git for-each-ref > ../../testoutput/webtools.common.fproj.txt
cd ../webtools.common/
git for-each-ref > ../../testoutput/webtools.common.txt
cd ../
# Back up the four repos with rewritten history
echo ""
echo "Back up the re-written repos for use later"
cp -R * ../intermediate
# Merge them into one repo
# The resultant repo will look like as if five trees
# were merged together. Four of those trees are
# the individual subfolders. The remaining fifth
# is a README.md commit.
echo "\nInitialize new container repo"
git init
echo "# webtools.common.all
webtools commons in one repository" > README.md
git add README.md
git commit -a -m "Initial Commit for unified repository"
INITIALHASH=`git log -p | grep "^commit" | head -n 1 | cut -f 2 -d " "`
echo $INITIALHASH
echo ""
echo "Merge in the four rewritten projects, with generic commit messages"
git pull --no-edit webtools.common.fproj
git pull --no-edit webtools.common
git pull --no-edit webtools.common.tests
git pull --no-edit webtools.common.snippets
FINALHASH=`git log -p | grep "^commit" | head -n 1 | cut -f 2 -d " "`
echo $FINALHASH
# Let's save the tags for each folder
echo ""
echo "Building a model of existing tags to rebuild them later"
echo "If some tags are the same in more than one repo, we will need"
echo "build a merge commit to store as the tag."
sleep 5
# Map of unique tags
declare -A ALL_UNIQUE_TAGS
# fproj
cd webtools.common.fproj
FPROJ_TAGS=`git for-each-ref | grep -v "origin" | grep "tags" | cut -f 2 -d " " | cut -f 2 -d$'\t' | cut -f 3 -d "/"`
FPROJ_TAG_ARR=($FPROJ_TAGS)
declare -A FPROJ_TAG_TO_COMMIT_ID
for var in "${FPROJ_TAG_ARR[@]}"
do
FPROJ_TAG_TO_COMMIT_ID[${var}]=`git log ${var} | head -n 1 | cut -f 2 -d " "`
ALL_UNIQUE_TAGS[${var}]="x"
done
cd ../
# common
cd webtools.common
COMMON_TAGS=`git for-each-ref | grep -v "origin" | grep "tags" | cut -f 2 -d " " | cut -f 2 -d$'\t' | cut -f 3 -d "/"`
COMMON_TAG_ARR=($COMMON_TAGS)
declare -A COMMON_TAG_TO_COMMIT_ID
for var in "${COMMON_TAG_ARR[@]}"
do
COMMON_TAG_TO_COMMIT_ID[${var}]=`git log ${var} | head -n 1 | cut -f 2 -d " "`
ALL_UNIQUE_TAGS[${var}]="x"
done
cd ../
# tests
cd webtools.common.tests
TESTS_TAGS=`git for-each-ref | grep -v "origin" | grep "tags" | cut -f 2 -d " " | cut -f 2 -d$'\t' | cut -f 3 -d "/"`
TESTS_TAG_ARR=($TESTS_TAGS)
declare -A TESTS_TAG_TO_COMMIT_ID
for var in "${TESTS_TAG_ARR[@]}"
do
TESTS_TAG_TO_COMMIT_ID[${var}]=`git log ${var} | head -n 1 | cut -f 2 -d " "`
ALL_UNIQUE_TAGS[${var}]="x"
done
cd ../
# snippets
cd webtools.common.snippets
SNIPPETS_TAGS=`git for-each-ref | grep -v "origin" | grep "tags" | cut -f 2 -d " " | cut -f 2 -d$'\t' | cut -f 3 -d "/"`
SNIPPETS_TAG_ARR=($SNIPPETS_TAGS)
declare -A SNIPPETS_TAG_TO_COMMIT_ID
for var in "${SNIPPETS_TAG_ARR[@]}"
do
SNIPPETS_TAG_TO_COMMIT_ID[${var}]=`git log ${var} | head -n 1 | cut -f 2 -d " "`
ALL_UNIQUE_TAGS[${var}]="x"
done
cd ../
echo ""
echo "Starting to rebuild all the tags."
echo "Tags that exist in only one repo will, when checked out, include only"
echo "the one subfolder, since we can't know what the others looked like at"
echo "that time. Tags existing in two or more repos will include only those subfolders."
echo "number of tags: " ${#ALL_UNIQUE_TAGS[@]}
sleep 10
# set up a working branch
git checkout -b test
rm -rf *
git reset --hard $INITIALHASH
# print out the list of unique tags, just to make sure we got it
for key in ${!ALL_UNIQUE_TAGS[@]}; do
rm -rf *
git reset --hard $INITIALHASH
SNIPPET_ID=${SNIPPETS_TAG_TO_COMMIT_ID[${key}]}
TESTS_ID=${TESTS_TAG_TO_COMMIT_ID[${key}]}
COMMON_ID=${COMMON_TAG_TO_COMMIT_ID[${key}]}
FPROJ_ID=${FPROJ_TAG_TO_COMMIT_ID[${key}]}
git reset --hard $INITIALHASH
if [ -z "$SNIPPET_ID" ]
then
echo "Tag " $key " does not exist in snippets"
else
echo "Tag " $key " does exists in snippets and has commit $SNIPPET_ID"
cd ../intermediate/webtools.common.snippets/
git checkout $SNIPPET_ID
cd ../../webtools.common.all/
git pull --no-edit ../intermediate/webtools.common.snippets
fi
if [ -z "$TESTS_ID" ]
then
echo "Tag " $key " does not exist in tests"
else
echo "Tag " $key " does exists in tests and has commit $TESTS_ID"
cd ../intermediate/webtools.common.tests/
git checkout $TESTS_ID
cd ../../webtools.common.all/
git pull --no-edit ../intermediate/webtools.common.tests
fi
if [ -z "$COMMON_ID" ]
then
echo "Tag " $key " does not exist in common"
else
echo "Tag " $key " does exists in common and has commit $COMMON_ID"
cd ../intermediate/webtools.common/
git checkout $COMMON_ID
cd ../../webtools.common.all/
git pull --no-edit ../intermediate/webtools.common
fi
if [ -z "$FPROJ_ID" ]
then
echo "Tag " $key " does not exist in fproj"
else
echo "Tag " $key " does exists in common and has commit $FPROJ_ID"
cd ../intermediate/webtools.common.fproj/
git checkout $FPROJ_ID
cd ../../webtools.common.all/
git pull --no-edit ../intermediate/webtools.common.fproj
fi
git tag $key
# sleep 5
done
#
# All tags are saved, lets move on to branches
#
cd ../intermediate/
UNIQ_BRANCHES=`ls -1 | awk '{ print "cd " $0 "; git branch -a | grep -v HEAD | grep -v master | cut -f 3 -d \"/\"; cd ../";}' | sh | sort | uniq`
UNIQ_BRANCHES_ARR=($UNIQ_BRANCHES)
cd ../webtools.common.all/
for var in "${UNIQ_BRANCHES_ARR[@]}"
do
git checkout -b $var
git reset --hard $INITIALHASH
rm -rf webtools.common webtools.common.fproj webtools.common.snippets webtools.common.tests
cd ../intermediate
#start in intermediate folder
cd webtools.common
COMMON_HAS=`git branch -a | grep $var`
if [ -z "$COMMON_HAS" ]
then
echo "Do nothing"
cd ..
else
git checkout $var
cd ../../webtools.common.all/
git pull --no-edit ../intermediate/webtools.common
cd ../intermediate/
fi
cd webtools.common.fproj
FPROJ_HAS=`git branch -a | grep $var`
if [ -z "$FPROJ_HAS" ]
then
echo "Do nothing"
cd ..
else
git checkout $var
cd ../../webtools.common.all/
git pull --no-edit ../intermediate/webtools.common.fproj
cd ../intermediate/
fi
cd webtools.common.tests
TESTS_HAS=`git branch -a | grep $var`
if [ -z "$TESTS_HAS" ]
then
echo "Do nothing"
cd ..
else
git checkout $var
cd ../../webtools.common.all/
git pull --no-edit ../intermediate/webtools.common.tests
cd ../intermediate/
fi
cd webtools.common.snippets
SNIPS_HAS=`git branch -a | grep $var`
if [ -z "$SNIPS_HAS" ]
then
echo "Do nothing"
cd ..
else
git checkout $var
cd ../../webtools.common.all/
git pull --no-edit ../intermediate/webtools.common.snippets
cd ../intermediate/
fi
cd ../webtools.common.all/
done
# cleanup our 'test' branch, created to create all tags
git branch -D test
# Push to my repo for backup.
#repo=mine
#repourl=git@github.com:robstryker/webtools.common.all.git
#git remote add $repo $repourl
#git checkout master
#git push --force $repo master
# push all branches
#for var in "${UNIQ_BRANCHES_ARR[@]}"
#do
# git checkout $var
# git push $repo $var
#done
#git push $repo --tags
END_TIME=`date +%s`
EXEC_TIME=$((END_TIME-START_TIME))
echo $EXEC_TIME " seconds execution time"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment