Skip to content

Instantly share code, notes, and snippets.

@ssp
Created January 23, 2012 13:21
Show Gist options
  • Save ssp/1663093 to your computer and use it in GitHub Desktop.
Save ssp/1663093 to your computer and use it in GitHub Desktop.
Extract a single file from a git repository

How to extract a single file with its history from a git repository

These steps show two less common interactions with git to extract a single file which is inside a subfolder from a git repository. These steps essentially reduce the repository to just the desired files and should performed on a copy of the original repository (1.).

First the repository is reduced to just the subfolder containing the files in question using git filter-branch --subdirectory-filter (2.) which is a useful step by itself if just a subfolder needs to be extracted. This step moves the desired files to the top level of the repository.

Finally all remaining files are listed using git ls, the files to keep are removed from that using grep -v and the resulting list is passed to git rm which is invoked by git filter-branch --index-filter (3.). A bit convoluted but it does the trick.

1. copy the repository to extract the file from and go to the desired branch

➜  /tmp  git clone git@github.com:ssp/pazpar2.git pazpar2g
Cloning into pazpar2g...
remote: Counting objects: 14950, done.
remote: Compressing objects: 100% (4092/4092), done.
remote: Total 14950 (delta 10938), reused 14719 (delta 10707)
Receiving objects: 100% (14950/14950), 3.30 MiB | 1.60 MiB/s, done.
Resolving deltas: 100% (10938/10938), done.
➜  /tmp  cd pazpar2g
➜  pazpar2g git:(master) git checkout ssp
Branch ssp set up to track remote branch ssp from origin.
Switched to a new branch 'ssp'

2. reduce the repository to just the subfolder »etc« which contains the interesting file(s)

➜  pazpar2g git:(ssp) git filter-branch --prune-empty --subdirectory-filter etc -- --all
Rewrite b3d4f2a89fdee662fb43122990fc28aa2c08bee5 (558/558)
Ref 'refs/heads/master' was rewritten
Ref 'refs/heads/ssp' was rewritten
Ref 'refs/remotes/origin/master' was rewritten
WARNING: Ref 'refs/remotes/origin/master' is unchanged
Ref 'refs/remotes/origin/ssp' was rewritten
Ref 'refs/tags/wildcard-matching' was rewritten

3. remove all files other than the ones you want to keep (tmarc.xsl, check-pazpar2.xsl)

➜  pazpar2g git:(ssp)  git filter-branch -f --prune-empty --index-filter 'git rm --cached --ignore-unmatch $(git ls-files | grep -v "tmarc.xsl\|check-pazpar2.sh")'
Rewrite f06a533323ad8257efa9e52c45ad2e22e2b09b1c (1/558)rm 'bibs.pz'
… [lenghty output omitted]
Ref 'refs/heads/ssp' was rewritten
➜  pazpar2g git:(ssp) ls -l
total 48
-rwxrwxr-x 1 ssp ssp  1359 2012-01-23 14:03 check-pazpar2.sh
-rw-rw-r-- 1 ssp ssp 41078 2012-01-23 14:03 tmarc.xsl
@evandrocoan
Copy link

@FichteFoll Update: With some help I got the following together which also included the paren files.

$ git filter-branch -f --prune-empty --index-filter 'git ls-files -z | grep -zv "the.whitelist" | xargs -0 git rm --cached --ignore-unmatch'

Thanks! This Worked 100%. The 3. remove all files other than the ones you want to keep (tmarc.xsl, check-pazpar2.xsl) on the original post, did not remove all the files.

@Werkov
Copy link

Werkov commented Apr 4, 2021

@rsalmei If you don't need actual commit in the new repository but you still want the complete history of the file you can link the repos using grafts:

git remote add oldrepo $OLDREPO
git fetch oldrepo
git replace --graft $OLDEST_COMMIT_IN_NEW_REPO $NEWEST_COMMIT_IN_OLD_REPO

(This is a necropost since this gist still comes up high in search results.)

@codemedic
Copy link

@lkraav

Thanks for that; the -r made me sane again!

git filter-branch -f --prune-empty --index-filter 'git ls-files -z | grep -zv "the.whitelist" | xargs -r -0 git rm --cached --ignore-unmatch'

That worked on Linux.

@evandrocoan
Copy link

I am not sure why, but the command which worked for in 2 years ago, did not work anymore. This is the new version:

git filter-branch -f --prune-empty --index-filter \
    'git ls-files -z \
     | grep -zv "$(cat "/absolute/path/the.allowed.list.txt")" \
     | xargs -0 -r -n 10 git rm -r --cached --ignore-unmatch -- {}'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment