Instantly share code, notes, and snippets.

Embed
What would you like to do?
Extract a single file from a git repository

How to extract a single file with its history from a git repository

These steps show two less common interactions with git to extract a single file which is inside a subfolder from a git repository. These steps essentially reduce the repository to just the desired files and should performed on a copy of the original repository (1.).

First the repository is reduced to just the subfolder containing the files in question using git filter-branch --subdirectory-filter (2.) which is a useful step by itself if just a subfolder needs to be extracted. This step moves the desired files to the top level of the repository.

Finally all remaining files are listed using git ls, the files to keep are removed from that using grep -v and the resulting list is passed to git rm which is invoked by git filter-branch --index-filter (3.). A bit convoluted but it does the trick.

1. copy the repository to extract the file from and go to the desired branch

➜  /tmp  git clone git@github.com:ssp/pazpar2.git pazpar2g
Cloning into pazpar2g...
remote: Counting objects: 14950, done.
remote: Compressing objects: 100% (4092/4092), done.
remote: Total 14950 (delta 10938), reused 14719 (delta 10707)
Receiving objects: 100% (14950/14950), 3.30 MiB | 1.60 MiB/s, done.
Resolving deltas: 100% (10938/10938), done.
➜  /tmp  cd pazpar2g
➜  pazpar2g git:(master) git checkout ssp
Branch ssp set up to track remote branch ssp from origin.
Switched to a new branch 'ssp'

2. reduce the repository to just the subfolder »etc« which contains the interesting file(s)

➜  pazpar2g git:(ssp) git filter-branch --prune-empty --subdirectory-filter etc -- --all
Rewrite b3d4f2a89fdee662fb43122990fc28aa2c08bee5 (558/558)
Ref 'refs/heads/master' was rewritten
Ref 'refs/heads/ssp' was rewritten
Ref 'refs/remotes/origin/master' was rewritten
WARNING: Ref 'refs/remotes/origin/master' is unchanged
Ref 'refs/remotes/origin/ssp' was rewritten
Ref 'refs/tags/wildcard-matching' was rewritten

3. remove all files other than the ones you want to keep (tmarc.xsl, check-pazpar2.xsl)

➜  pazpar2g git:(ssp)  git filter-branch -f --prune-empty --index-filter 'git rm --cached --ignore-unmatch $(git ls-files | grep -v "tmarc.xsl\|check-pazpar2.sh")'
Rewrite f06a533323ad8257efa9e52c45ad2e22e2b09b1c (1/558)rm 'bibs.pz'
… [lenghty output omitted]
Ref 'refs/heads/ssp' was rewritten
➜  pazpar2g git:(ssp) ls -l
total 48
-rwxrwxr-x 1 ssp ssp  1359 2012-01-23 14:03 check-pazpar2.sh
-rw-rw-r-- 1 ssp ssp 41078 2012-01-23 14:03 tmarc.xsl
@KTamas

This comment has been minimized.

Show comment
Hide comment
@KTamas

KTamas Jul 8, 2012

Thank you, thank you, thank you. This was extremely useful for me.

KTamas commented Jul 8, 2012

Thank you, thank you, thank you. This was extremely useful for me.

@ssp

This comment has been minimized.

Show comment
Hide comment
@ssp

ssp Jul 13, 2012

You’re welcome. I’m glad I’m not the only one to find this rather non-obvious.

Owner

ssp commented Jul 13, 2012

You’re welcome. I’m glad I’m not the only one to find this rather non-obvious.

@FichteFoll

This comment has been minimized.

Show comment
Hide comment
@FichteFoll

FichteFoll May 2, 2013

Thank you, really helped me.

However, it somehow didn't remove files with parenthesis like "Default (Windows).sublime-keymap" and I couldn't get it to work. Executing the $() resulted as expected but the command as a whole didn't work. I ended up removing the files manually by replacing the $() with each file respectively but maybe someone knows how to fix the command.

FichteFoll commented May 2, 2013

Thank you, really helped me.

However, it somehow didn't remove files with parenthesis like "Default (Windows).sublime-keymap" and I couldn't get it to work. Executing the $() resulted as expected but the command as a whole didn't work. I ended up removing the files manually by replacing the $() with each file respectively but maybe someone knows how to fix the command.

@FichteFoll

This comment has been minimized.

Show comment
Hide comment
@FichteFoll

FichteFoll May 2, 2013

Update: With some help I got the following together which also included the paren files.

$ git filter-branch -f --prune-empty --index-filter 'git ls-files -z | grep -zv "the.whitelist" | xargs -0 git rm --cached --ignore-unmatch'

FichteFoll commented May 2, 2013

Update: With some help I got the following together which also included the paren files.

$ git filter-branch -f --prune-empty --index-filter 'git ls-files -z | grep -zv "the.whitelist" | xargs -0 git rm --cached --ignore-unmatch'
@lkraav

This comment has been minimized.

Show comment
Hide comment

lkraav commented Jul 20, 2013

@bittracker

This comment has been minimized.

Show comment
Hide comment
@bittracker

bittracker Aug 25, 2015

I Think this is an much faster way:

git archive --remote=http://bittracker.org/someproject.git HEAD:<path/to/directory/or/file> <filename> | tar -x

bittracker commented Aug 25, 2015

I Think this is an much faster way:

git archive --remote=http://bittracker.org/someproject.git HEAD:<path/to/directory/or/file> <filename> | tar -x
@bcipolli

This comment has been minimized.

Show comment
Hide comment
@bcipolli

bcipolli Dec 25, 2015

@bittracker would that preserve the git history?

bcipolli commented Dec 25, 2015

@bittracker would that preserve the git history?

@moritz

This comment has been minimized.

Show comment
Hide comment
@moritz

moritz Jul 14, 2016

If you have a lot of files to delete (like if you want to preserve a file from a top-level directory), you can add a -q option to the git rm call, which will make it way faster.

Background: without the -q (quiet) option, git rm prints the name of all the deleted files, which the terminal needs to handle, and which is slower than the actual git operations.

moritz commented Jul 14, 2016

If you have a lot of files to delete (like if you want to preserve a file from a top-level directory), you can add a -q option to the git rm call, which will make it way faster.

Background: without the -q (quiet) option, git rm prints the name of all the deleted files, which the terminal needs to handle, and which is slower than the actual git operations.

@rsalmei

This comment has been minimized.

Show comment
Hide comment
@rsalmei

rsalmei Apr 19, 2017

Please, what if I can't filter a single directory, as the file has been moved around, nor grep -v its name, as it's has been renamed? 😞
With a git log --follow --all -- the_file.py I can see all commits that touched it... How could I extract it to another repo? Steps 2 and 3 can't be performed as is... Thank you!

rsalmei commented Apr 19, 2017

Please, what if I can't filter a single directory, as the file has been moved around, nor grep -v its name, as it's has been renamed? 😞
With a git log --follow --all -- the_file.py I can see all commits that touched it... How could I extract it to another repo? Steps 2 and 3 can't be performed as is... Thank you!

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Jan 25, 2018

how would i keep up with changes to the needed file? when doing a git pull it replaces the local repository with the original directory tree.

ghost commented Jan 25, 2018

how would i keep up with changes to the needed file? when doing a git pull it replaces the local repository with the original directory tree.

@mys721tx

This comment has been minimized.

Show comment
Hide comment
@mys721tx

mys721tx Jun 15, 2018

@ghost a post-merge hook maybe?

mys721tx commented Jun 15, 2018

@ghost a post-merge hook maybe?

@designerzim

This comment has been minimized.

Show comment
Hide comment
@designerzim

designerzim Sep 17, 2018

I Think this is an much faster way:

git archive --remote=http://bittracker.org/someproject.git HEAD:<path/to/directory/or/file> <filename> | tar -x

It's taking longer to debug the completely useless "fatal: Operation not supported by protocol." message than to just use the OP method.

designerzim commented Sep 17, 2018

I Think this is an much faster way:

git archive --remote=http://bittracker.org/someproject.git HEAD:<path/to/directory/or/file> <filename> | tar -x

It's taking longer to debug the completely useless "fatal: Operation not supported by protocol." message than to just use the OP method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment