Skip to content

Instantly share code, notes, and snippets.

@rmg
Last active July 26, 2023 19:34
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rmg/c0d542cc3a0338874d7e to your computer and use it in GitHub Desktop.
Save rmg/c0d542cc3a0338874d7e to your computer and use it in GitHub Desktop.
Dump current directory and all contents to a git branch

git-snapshot

Out of band syncing to a local git branch.

This started as an experiment to see how commits could be created without modifying the working directory, the index, or the state of HEAD. The end result is like a cross between rsync and git stash using a specified branch.

Benefits

This allows you to deploy your current working directory:

  • without polluting your master branch with build artifacts
  • without modifying your current checkout at all:
    • no branch switching
    • no index clobbering
    • no complaining about clobbering unstaged changes

Example

Here's how you would use it to push your current working directory to a git push based hosting provider (like Heroku):

$ git-snapshot production
Creating branch 'production' if it doesn't exist...
Creating index... (this can take a while)
Creating tree from index...
Created tree 1e035b51027ba785c74155025cd616bbd20d8d17
Creating commit...
Created commit f2527eb28622d7a17403f723d36584027e595723
$ git push hosting production:master

References

In order of appearance:

TODO

  • Option to create tag instead of branch
  • Specify commit message
  • Don't do empty commits, it means the branch is already up to date
  • Find portable replacement for stat
#!/bin/bash
# Fail fast, we're messing with git internals!
set -e
BRANCH=${1:-deploy}
# Don't interfere with the real index, which means we are immune to "dirty" working copies
TMPDIR=`mktemp -d -t deploy-indexXXXXXX`
export GIT_INDEX_FILE=$TMPDIR/index
# This is where we would want to "Do stuff" like 'npm install --production`
echo "Creating branch '$BRANCH' if it doesn't exist..."
git branch $BRANCH 2> /dev/null || echo "using existing"
PARENT=`git rev-parse refs/heads/$BRANCH`
# Start building our new index
echo "Creating index... (this can take a while)"
# Every file that is tracked or untracked
git ls-files --cached --other | while read f; do
# Create a new git blob object, writing it to .git/objects
if test -L "$f"; then
# Symlinks are hashes of the path pointed to
OBJ=`readlink "$f" | git hash-object -w --stdin`
else
# Everything else is a hash of the file contents
OBJ=`git hash-object -w "$f"`
fi
MODE=`stat -f %p "$f"`
# Add the object to the index as a file with a name and mode
git update-index --add --replace --cacheinfo $MODE $OBJ "$f"
done
# At this point we've got a giant index representing the current state of every file under $CWD
echo "Creating tree from index..."
TREE=`git write-tree --missing-ok`
echo "Created tree $TREE"
# Create a new commit object decending from the $BRANCH branch containing out new tree
echo "Creating commit..."
SHA1=`git commit-tree -p $PARENT -m "Deploy snapshot $(date)" $TREE`
echo "Created commit $SHA1"
# We now have a commit, but nothing is pointing to it
# Update our "$BRANCH" branch to point to the new commit, which is a child of the $BRANCH branch head
git update-ref refs/heads/$BRANCH $SHA1
# Clean up after ourselves, or echo the name of our temp file if $DEBUG is set
test -n "$DEBUG" && ls -l $GIT_INDEX_FILE || rm -rf $TMPDIR
@sam-github
Copy link

If I read this correctly, this combines commit-head-onto-deploy and commit-build-products-to-deploy, which I didn't want to do, I want two commits.

You're just showing me code that does something different, what I'm interested in is what your design goals are, why you think different is better. This might be better done in a hangout, I do want feedback, but I'm not sure what to do with this. I could have written onto and commit as a single atomic operation, I didn't do that on purpose.

  1. It doesn't facilitate composability. The existing commands mostly (--install is an exception) do one thing, so you can choose to use the whole set of commands, or just pick a few you find useful, for example:

    slc build --onto deploy
    some custom build process
    slc build --commit
    
  2. it doesn't facilitate running one by one, so you can see what it did, which I think is helpful to people who are having "wtf does this slc build command do, and why would I need it" moment, they can do each command in sequence, and see the result. This is also why I log every command done during build to the console, I want it to be very clear what the tool is doing, so people can look at the output and be "ah, ha, I get it" without reading the docs. (well, knowledgable people, the others will have to read the docs)

  3. it melds development source and build products into a single giant commit, whereas how I did it always added two commits to the deploy branch: the first was a literal copy of the exact source tree, trivially verifiable to be correct, either via git diff, or examination of the tree object hash, and the second was a commit of build products, so you can see exactly what the build artificacts were. you can also do a git status between --install and --commit, you should observe no change in status.... that all build products are ignored, and the build process did not modify any of your source files.

Note how a bunch of people who didn't know about git commit-tree suggested I do this is:

// crappy version of --onto
git co deploy
git merge master
// tedious and incomplete version of --install
npm install // etc.
// --commit is a just a tiny convenience over the below
git add --force -A .
git commit

This sequence when done repeatedly over time to a deploy branch evolves a state that is a munge of all history, the commit-tree based approach throws away all previous src and build product state.

Note the above also leaves you on the deploy branch, which is why I think its reasonable that slc --onto deploy --install --commit also leaves you on the deploy branch.

@rmg
Copy link
Author

rmg commented Jul 25, 2014

It does not leave you on the deploy branch. In fact it doesn't even rely on your working copy being clean and doesn't even look at your current index if you happen to be mid-commit. This script takes syncs the specified branch to the current contents of $CWD using completely out of band means. It's like rsync if rsync allowed a branch as a destination.

The point was to demonstrate that you can do this kind of stuff without checking out the target branch or affecting the state of a working copy at all.

@sam-github
Copy link

Sorry, you are clearly not compelled by the git support in slc build, but I'm not really seeing why.

  • "without polluting your master branch with build artifacts"

The entire build process (npm install etc) in your gist modifies the working copy and leaves the atifacts there, I don't see the difference.

"without modifying your current checkout at all":

Except all the build products?

And possibly a modified package.json file and a new .npmignore file (if you use --bundle).

  • "no branch switching"

I do that deliberately on https://github.com/strongloop/strong-build/blob/master/lib/git.js#L90 with an explicit git checkout deploy, because without it you can't run "slc deploy", not doing so seems an anti-feature, and see below.

  • "no index clobbering"
  • "no complaining about clobbering unstaged changes"

You use a different index file, allowing the working copy to have a partially constructed index. Seems pretty questionable practice, do you generally do deployment builds with partically constructed indexes?

Still, robustness is important, and adding a custom GIT_INDEX_FILE to slc build would make it more robust, and work better if it is called during a commit (though libuv/node's lack of a cross-platform mkstemp makes that surprisingly hard, probably I'd need a binary dep). I could do that, it doesn't involve the wholesale rewrite you have above.

In summary, the main difference I can see between my command and yours above (other than it is an OS X specific shell script :-), stat doesn't work like that on linux) is:

  • you restore the original branch

I could do that, too, its easy, but it means you can't run slc deploy anymore.

Actually, its MUCH worse than that, you can in fact run slc deploy, and probably will, as Chandrika did in her walk-through, but you will be running it on the WRONG branch, which will give every appearance of success.... except that you will push an un-built branch, and npm install will download every dependency on the server, meaning you used slc build, but got zero value from it. Ouch.

  • you put all the source and bulid products into one commit

I said why I don't like this, you haven't said why forcing this is a good idea. I could add it.

  • you don't allow custom build steps, forcing all builds to be done using npm scripts

You are clearly not compelled by slc build, and I'd like you, or at least people like you, to be compelled, but so far all I really see is you don't like that it leaves you on the deploy branch and doesn't use its own index (latter easily fixed), and yours is less flexible and can't have its pieces composed into an existing build process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment