Skip to content

Instantly share code, notes, and snippets.

@gitaarik
Last active October 8, 2024 10:13
Show Gist options
  • Save gitaarik/8735255 to your computer and use it in GitHub Desktop.
Save gitaarik/8735255 to your computer and use it in GitHub Desktop.
Git Submodules basic explanation

Git Submodules basic explanation

Why submodules?

In Git you can add a submodule to a repository. This is basically a repository embedded in your main repository. This can be very useful. A couple of usecases of submodules:

  • Separate big codebases into multiple repositories.

    Useful if you have a big project that contains multiple subprojects. You can make every subproject a submodule. This way you'll have a cleaner Git log, because the commits are specific to a certain submodule.

  • Re-use the submodule in multiple parent repositories.

    Useful if you have multiple repositories that share a common component. With this approach you can easily update that shared component in all the repositories that added them as a submodule. This is a lot more convienient than copy-pasting the code into the repositories.

Basics

When you add a submodule in Git, you don't add the code of the submodule to the main repository, you only add information about the submodule that is added to the main repository. This information describes which commit the submodule is pointing at. This way, the submodule's code won't automatically be updated if the submodule's repository is updated. This is good, because your main repository might not work with the latest commit of the submodule; it prevents unexpected behaviour.

Adding a submodule

You can add a submodule to a repository like this:

git submodule add git@github.com:path_to/submodule.git path-to-submodule

With default configuration, this will check out the code of the submodule.git repository to the path-to-submodule directory, and will add information to the main repository about this submodule, which contains the commit the submodule points to, which will be the current commit of the default branch (usually the master branch) at the time this command is executed.

After this operation, if you do a git status you'll see two files in the Changes to be committed list: the .gitmodules file and the path to the submodule. When you commit and push these files, you'll commit/push the submodule to the origin.

Getting the submodule's code

If a new submodule is created by one person, the other people in the team need to initiate this submodule. First you have to get the information about the submodule, this is retrieved by a normal git pull. If there are new submodules you'll see it in the output of git pull. Then you'll have to initiate them with:

git submodule init

This will pull all the code from the submodule and place it in the directory that it's configured to.

If you've cloned a repository that makes use of submodules, you should also run this command to get the submodule's code. This is not automatically done by git clone. However, if you add the --recurse-submodules flag, it will.

Pushing updates in the submodule

The submodule is just a separate repository. If you want to make changes to it, you should make the changes in its repository and push them like in a regular Git repository: Just execute the git commands in the submodule's directory. However, you should also let the main repository know that you've updated the submodule's repository, and make it use the new commit of the repository of the submodule. Because if you make new commits inside a submodule, the main repository will still point to the old commit.

If there are changes in the submodule's repository, and you do a git status in the main repository, then the submodule will be in the Changes not staged for commit list, and will have the text (modified content) behind it. This means that the code of the submodule is checked out on a different commit than the main repository is pointing to. To make the main repository point to this new commit, you should create another commit in the main repository.

The next sections describe different scenarios on doing this.

Make changes inside a submodule

  • cd inside the submodule directory.
  • Make the desired changes.
  • git commit the new changes.
  • git push the new commit.
  • cd back to the main repository.
  • In git status you'll see that the submodule directory is modified.
  • In git diff you'll see the old and new commit pointers.
  • When you git commit in the main repository, it will update the pointer.

Update the submodule pointer to a different commit

  • cd inside the submodule directory.
  • git checkout the branch/commit you want to point to.
  • cd back to the main repository.
  • In git status you'll see that the submodule directory is modified.
  • In git diff you'll see the old and new commit pointers.
  • When you git commit in the main repository, it will update the pointer.

If someone else updated the submodule pointer

If someone updated a submodule, the other team-members should update the code of their submodules. This is not automatically done by git pull, because with git pull it only retrieves the information that the submodule is pointing to another commit, but doesn't update the submodule's code. To update the code of your submodules, you should run:

git submodule update

If a submodule is not initiated yet, add the --init flag. If any submodule has submodules itself, you can add the --recursive flag to recursively init and update submodules.

What happens if you don't run this command?

If you don't run this command, the code of your submodule is checked out to an old commit. When you do git status in the main repository, you will see the submodule in the Changes not staged for commit list with the text (modified content) behind it. If you would do a git status inside the submodule, it would say HEAD detached at <commit-hash>. This is not because you changed the submodule's code, but because its code is checked out to a different commit than the commit used in the main repository. So in the main repo, Git sees this as a change, but actually you just didn't update the submodule. So if you're working with submodules, don't forget to keep your submodules up-to-date.

Making it easier for everyone

It is sometimes annoying if you forget to initiate and update your submodules. Fortunately, there are some tricks to make it easier:

git clone --recurse-submodules

This will clone a repository and also init / update any possible submodules the repository has.

git pull --recurse-submodules

This will pull the main repository and also it's submodules.

And you can make it easier with aliases:

git config --global alias.clone-all 'clone --recurse-submodules'
git config --global alias.pull-all 'pull --recurse-submodules'
@gitaarik
Copy link
Author

Indeed @zhl355, thanks for the feedback :)

@reponemec
Copy link

@gitaarik: Are all @hsaih reviews correct? If so, the blog should be rewritten, thank you.

@gitaarik
Copy link
Author

gitaarik commented Jun 15, 2022

Most of what @hsaih is not entirely correct. The code is never added to the main repository, only a pointer to the commit of the submodule. The code will be checked out on your file system, but it does not appear in the actual main git repository.

Also git pull does a git fetch, and most people typically just use git pull. And advanced users, that use git fetch, will know that git pull does a git fetch and will receive new information about the repository.

The other points seem incorrect or might be correct for a particular git config, which a particular user that has set these configs will know for themselves, so not worth mentioning in a basic introduction. In any case, I haven't heard any other complaints about it so it seems to me that only this user had experienced issues with this.

@adebayo10k
Copy link

Nice explanation @gitaarik. Submodules can get tricky if you're not yet able to visualise what's going on. It's best to avoid a submodules of submodules case until you've had some practice. Also, when a repository and a submodule both contain the same third repository as their submodule, I think it's best to make sure only one of the third submodule is available in the repository. Been there. It wasn't pretty.

I'm still learning, but my solution to that specific case has been the cautious approach, so rather than using:
git clone --recurse-submodules ...

To clone, initialise and fetch changes from the submodule repositories, one submodule repository at a time, starting at the main project repository root.

So first just:
git clone <repository url>.git

Then in the main project root directory:
git submodule init && git submodule update

Then, in a submodule directory, if you don't need it's submodule directory contents populated, just finish with:
git fetch && git merge origin/main

Thanks.

@Omernn23
Copy link

Omernn23 commented Nov 8, 2022

If I added submodule for logs in my repo and I want to use its compiled build,
How do I push it to git so othere developer also can use the compiled build.
In more details , in the submodule directory I did :

mkdir build && cd build
cmake .. && make -j
sudo make install

and in my cmake in the main repo I use that build.
So How can I add the build to the git ?

Another issue is that I want to get the submodule's code from another remote that is checkout to the same brunch.
I did as you suggest:

git fetch 
git pull
git submodule init 

this was the outout:
"Submodule '3rd_party/spdlog' (https://github.com/gabime/spdlog.git) registered for path '3rd_party/spdlog'"

but when I entered to the sub directory with the submodule, didn't see the files of the repo.
It was empty , so I did:

git submodule update --init --force --remote

this was the output :
"Cloning into '/home/nx1/Documents/odrive_integration/3rd_party/spdlog'..."
Submodule path './': checked out 'dea6bb1085466370ed6d629b4d462f299db75958'
but not all the files and directory were there.
What is wrong??

@gitaarik

@gitaarik
Copy link
Author

gitaarik commented Nov 8, 2022

Hi @adebayo10k, so you have a setup like this?

main repo
    sub repo1
    sub repo2
        sub repo1

Then I can understand it's confusing to have repo1 at 2 places. And that it causes problems with --recurse-submodules. But yeah it's certainly possible to work like that if you know what you're doing. Although I would advice to keep it simple if possible.

@gitaarik
Copy link
Author

gitaarik commented Nov 8, 2022

Hi @Omernn23, typically build files are not added to git repositories, because they are too system-specific: a build created on one system won't necessarily work on another system. So typically the process for the developers is to pull the code, then create the build themselves. Also, when you do make install, it installs the build to your system, which is outside your git repository. So to install the software to your system, you would have to execute this command anyway. So you can't make it work out-of-the-box with git.

@Omernn23
Copy link

Omernn23 commented Nov 8, 2022

First of all thanks for the response @gitaarik .
Ok I understand, so all the command for the build I'll do per system.
but I still dont know how can I pull all the files of the submodule in another system.
As I mentioned in the second issue, it won't bring the files .
What im I doinf wrong?
Its exectlly as you wrote.

@gitaarik
Copy link
Author

gitaarik commented Nov 9, 2022

@Omernn23 Maybe you should do the git pull --recurse-submodules command from the main repository directory? It's in the explanation.

Copy link

ghost commented Dec 6, 2022

great works!

@guest271314
Copy link

Is this possible to do on the general GitHub repository or on https://github.dev/?

@Asadrana123
Copy link

Can I access the submodule if project from my local machine is deleted?? pls tell it is uregent!! my hardwork of 2 month is wasted

@venkatrahul-software-development

Very informative documentation, thanks to @gitaarik but this needs to be updated too. Please see @hsaih message/comment.

@glaserf
Copy link

glaserf commented Sep 21, 2023

So let me get this straight, this official guide still contains errors? Can someone please clear this up? It comes up as top hit on Google when people search for how to use git submodules. Thanks.

@gitaarik
Copy link
Author

gitaarik commented Sep 23, 2023

@glaserf This is not an official guide, just a quick reference for git submodules that I created for my colleagues at the time. I shared it to the public by putting it here on gist. It looks like a lot of people find the guide useful. I'm still maintaining it. @venkatrahul-software-development The comments from @hsaih have already been discussed. In my view there are no errors in this guide.

@Gztabo21
Copy link

thank you for content

@morgankar
Copy link

Helpful guide, thanks!

@ionutzp
Copy link

ionutzp commented Jul 3, 2024

🔥

@khaledkhlifi-deca
Copy link

The first time i know about Git Submodules, very clear article, thank you !
May be it will be more complete if it explains the advantages of using Git submodule over other alternatives, for example a multi-module parent project using maven (or any build automation tool).

@gabrieleolmi
Copy link

This should clarify the questions about .gitignore, source: https://stackoverflow.com/a/7912101/11689625

#### Should I add the submodule directory to .gitignore?

No, you don't need to add your submodule to your .gitignore: what the
parent will see from your submodule is a gitlink. That means: any change
directly made in a submodule needs to be followed by a commit in the
parent directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment