Skip to content

Instantly share code, notes, and snippets.

@sualeh
Last active July 7, 2021 09:40
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sualeh/9ba5242d4df6d91e8f4bf3549bc1b5fe to your computer and use it in GitHub Desktop.
Save sualeh/9ba5242d4df6d91e8f4bf3549bc1b5fe to your computer and use it in GitHub Desktop.
How to Use git to Find Modified Files
# Remove all cached files from the git index
git rm -r --cached .
# All all files (and files in sub-directories) to the index
# but do not commit
git update-index --info-only --add **/*
git update-index --info-only --add `ls -p`
# Find changes
git ls-files . -d -m -o --exclude-standard --full-name -v

Introduction

I keep a number of personal files on my computer, organized in folders. These could be photos, financial information, and so on. As I work with these files, I add to them, sometimes modify them to edit a photo, or add to notes, and move or rename them in various ways to reorganize. I wanted a good way to keep track of these changes.

My first thought is that I would write a Python program to scan the folders, and print MD5 checksums of each file in a readable way. This way, I could save off the old "index", and compare it with a new index using a standard diff tool. My attempt at this program is sualeh/diff-name-only. My disclaimer if you look at the code is that I am still teaching myself Python, and have not reached the heights of Pythonic Zen.

As I was writing this code, I was struck by how much of what I needed was already done by a standard source control system such as git. I could simply use git, and solve my problems. However, git keeps track of objects, and in my case these could be pretty sizable. In effect, I would have the files in my working tree, as well as copies in the git cache. This would double the size of storage that I needed, where I just needed some bookkeeping on the metadata for these files. I was not interested in the actual diffs. To explore, I first set myself to write user stories that would clarify exactly what I was experimenting with.

User Stories

The operations that I needed to worry about would normally fall into the CRUD (create/ read/ update/ delete) category. However, for files, it the acronym might more accurately be CUMD (create/ update/ move (or rename)/ delete). And for files, there would be a special case for files in the root directory, and files in a sub-directory.

I started with two random images, which I named 1.jpg and 2.jpg. I used images since I wanted binary files, and I wanted data of some heft that could be easily noticed in .git/objects. My goal, of course, was that I did not want git to track the objects.

For setup of my test cases, I first created a top-level folder, and copied the two images into it. I opened a command shell into that folder, and then I did the following:

# Create the test project
mkdir test-project
cd test-project

## Initialize a local git repository to track changes
git init

At this point, I checked the git status. The most common way to check the status is git status, but since I planned to use only the git index, I used the following command instead:

# Find changes
git ls-files . -d -m -o --exclude-standard --full-name -v

This command serves to report on changes to deleted, modified and new files, while still honoring standard git excludes.

User Story "C" - Create New Files

The first user story would probably read something like this:

Feature: Keep track of file changes in a folder

Given a blank folder When I create new files Then I can find which files have been created

If you followed the steps above, you are already at the given phase of the system. Next, let us "create" some files like this:

# Create some new files, and some new files in a folder
cp ../*.jpg .
mkdir fldr1
cp *.jpg fldr1 

After this, if you run the command above to find changes, you will see the following output:

$ git ls-files . -d -m -o --exclude-standard --full-name -v
? 1.jpg
? 2.jpg
? fldr1/1.jpg
? fldr1/2.jpg

This satisfies the then portion of the user story, since the out shows the files we created as new files. If you look at the git ls-files documentation, you will see what each code means. ? signifies new files, in our case.

Code Description
H cached
S skip-worktree
M unmerged
R removed/deleted
C modified/changed
K to be killed
? other

User Story to Reset Tracking

Ok, so now we know that these files have been added, and we want to baseline these changes. This allows us to continue to track new changes to the files from this point onwards. The user story might read thus:

Feature: Keep track of file changes in a folder

Given a folder with some files When I want to baseline some changes Then I reset tracking changes

After running through the previous user story, you can baseline the changes using the following commands.

# Remove all cached files from the git index
git rm -r --cached .
# All all files (and files in sub-directories) to the index
# but do not commit
git update-index --info-only --add **/*
git update-index --info-only --add `ls -p`

(HINT: Save this as baseline.sh)

Then, to check that the changes have been baselined, run:

$ git ls-files . -d -m -o --exclude-standard --full-name -v

and you will get no output, indicating that there are no recent modifications since the last baseline. (HINT: Save this as find-changes.sh)

User Story "U" - Update Files

Our next user story involves modifying some files.

Feature: Keep track of file changes in a folder

Given a folder with some files When I modify some files Then I can find which files have been modified

Please follow the user stories in the sequence above, so that the given portions can be satisfied. Then, let us modify some files like this:

# Modify some files
cp 2.jpg 1.jpg
cp fldr1/2.jpg fldr1/1.jpg

Then, when we check to see which files have been modified, we get the following:

$ git ls-files . -d -m -o --exclude-standard --full-name -v
C 1.jpg
C fldr1/1.jpg

"C", if you recall from the table above, signifies modified or changed files.

After you do the update, please remember to reset the changes.

User Story "M" - Move (or Rename) Files

Our next user story involves moving and renaming some files.

Feature: Keep track of file changes in a folder

Given a folder with some files When I move or rename some files Then I can find which files have been moved and renamed

Please follow the user stories in the sequence above, so that the given portions can be satisfied. Then, let us move and rename some files like this:

mv 1.jpg 3.jpg
mv fldr1/1.jpg fldr1/3.jpg

Then, you can find the moved and renamed files like this:

$ git ls-files . -d -m -o --exclude-standard --full-name -v
? 3.jpg
? fldr1/3.jpg
R 1.jpg
C 1.jpg
R fldr1/1.jpg
C fldr1/1.jpg

git does not really shine here. It shows moves and renames as a combination of a removed file, new file and changed file. You can play with the -d -m -o command-line switch to get the output you desire. However, this full output may actually be useful if you use it to script some further changes.

After you do the update, please remember to reset the changes.

User Story "D" - Delete Files

Our final user story involves deleting files.

Feature: Keep track of file changes in a folder

Given a folder with some files

When I delete some files

Then I can find which files have been deleted

Please follow the user stories in the sequence above, so that the given portions can be satisfied. Then, let us delete some files like this:

# Delete some files, including some in a sub-directory
rm 2.jpg
rm fldr1/2.jpg

Then, you can find the deleted files like this:

$ git ls-files . -d -m -o --exclude-standard --full-name -v
R 2.jpg
C 2.jpg
R fldr1/2.jpg
C fldr1/2.jpg

Conclusion

These user stories show that we can use the git index to keep track of changes to our files, even if they are in sub-folders. Since we are using just the git index without keeping track of objects, you will notice that we do not commit anything. Since there are no commits, there is nothing to tag, branch, or push. Our "commit" really consists of clearing the git index, and re-adding all files to the index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment