Skip to content

Instantly share code, notes, and snippets.

@mitcom
Last active October 19, 2019 20:49
Show Gist options
  • Save mitcom/793a987fe659ed8e08367dd6e5962ae4 to your computer and use it in GitHub Desktop.
Save mitcom/793a987fe659ed8e08367dd6e5962ae4 to your computer and use it in GitHub Desktop.

Local git repository without porcelain commands

How to create local git repository without git init?

mkdir -p .git/objects .git/refs
echo "ref: refs/heads/master" > .git/HEAD  # instead of 'master' here can be any branch name

How to verify if it is enough?

git status  # or even more direct way
git rev-parse --is-inside-work-tree

What more git init does? Creates few more empty directories for future logs and prepare exampled hook scripts.

How to create files in git repository without git commands?

Git uses concept called hash objects. Files are stored as blob objects. They are basicly simple text (but compressed) files and stored only file contenst without even file name or other properties. To create blob objects we need for following file (ended with \n)

print('Hello World!')
  • we need to know its size (in bytes)

    $ printf "print('Hello World!')\n" | wc -c
    22
  • compute md5 sum of its content preceded by a simple prefix containing hash object type (blob), single space, content size and char with ASCII code 0 (\0). Then we have file content.

    blob 22\0print('Hello World!')\n
    #\    \ \ \_ file content
    # \    \ \_ ASCII 0 char
    #  \    \_ content size
    #   \_ hash object type
    $ printf "blob 22\0print('Hello World!')\n" | shasum
    73fb7c3fbdbf4258a2d08f15fa7d4cd8556d0b67  -

    This will be needed as hash object filename.

  • and compress such string (file content with prefix) using zlib algorithm. We will use following Python script

    # git_compress.py
    import sys
    import zlib  # it exist in Python standard library
    
    
    stdin = sys.stdin.buffer.read()  # this allows us to use it in pipe
    length = str(len(stdin)).encode()  # we need size as "ASCII digits" not integer
    
    sys.stdout.buffer.write(
        zlib.compress(
            b'blob ' + length + b'\0' + stdin,  # now we can provide only file content
            1,  # default value is 'Z_DEFAULT_COMPRESSION' (-1), git uses 'Z_BEST_SPEED'
        ),  # see https://docs.python.org/3.8/library/zlib.html#zlib.Compress.compress
    )

    I will save that script as git_compress.py, but it's cleaner to put this script in other directory to not polute local git working directory and apply following snippet to your case.

    $ printf "print('Hello World!')\n" | python ./git_compress.py
    xK��OR02b((��+�P�H����/�IQT���	7%

    and this is actual hash object content we want to store. We will store it in objects directory we created in the first part. The last requirements is that filename (md5 sum we got earlier) has to be split in the first two characters and the rest. The first two will become the folder name and the rest the actual filename. It is a form of mitigation some file systems limitation - it prevents storing to many files in one folder.

    $ mkdir .git/objects/73
    $ printf "print('Hello World!')\n" | python ./git_compress.py \
        > .git/objects/73/fb7c3fbdbf4258a2d08f15fa7d4cd8556d0b67

How to verify if we crate valid git hash object?

  • We can verify hash object content using git cat-file command

    $ git cat-file -p 73fb7c3fbdbf4258a2d08f15fa7d4cd8556d0b67
    print('Hello World!')
    # the \n char is printed and evaluated by terminal to actual break line
  • and if just want to check hash object type

    $ git cat-file -t 73fb  # in many places we can use just first 4 characters of hash
    blob
printf "" | git hash-object --stdin
> e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
printf "" | git hash-object --stdin -w
> ./git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
echo "" | python -c "import sys; import zlib; stdin=sys.stdin.buffer.read(); length=str.encode(str(len(stdin))); sys.stdout.buffer.write(zlib.compress(b'blob ' + length + b'\0' + stdin, 1))" | xxd -p
> 78014bcac94f523064e002000bad01fb
# read blob object
cat .git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
> xK��OR0` ��%
xxd -p .git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
> 78014bcac94f523060000009b001f0
cat .git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 | python -c "import zlib; import sys; print(zlib.decompress(sys.stdin.buffer.read()).decode('utf-8'))"
> blob 0
git update-index --add --cacheinfo 100664 `<blob hash>` `<filename>`
git write-tree # creates tree object using data from previously created index
git commit-tree `<tree hash>` -m `<commit message>`
echo <commit hash> > .git/refs/heads/master
git checkout HEAD -- `<filename>`
or
https://git-scm.com/docs/git-checkout-index
git ls-tree -r <tree hash>
git show HEAD~10:package.json | diff -u package.json - | colordiff
git reflog
git ls-files --stage
https://github.com/git/git/blob/master/Documentation/technical/index-format.txt
https://mincong-h.github.io/2018/04/28/git-index/
git update-ref refs/head/master <commit-hash>
git count-objects
git verify-pack -v .git/objects//pack
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment