Skip to content

Instantly share code, notes, and snippets.

@tfidfwastaken
Last active April 11, 2021 10:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tfidfwastaken/0c6ca9ef2a452f110a416351541e0f19 to your computer and use it in GitHub Desktop.
Save tfidfwastaken/0c6ca9ef2a452f110a416351541e0f19 to your computer and use it in GitHub Desktop.
GSoC Proposal for Git

Table of Contents

  1. Personal Details
  2. Background
  3. Me and Git
    1. Current knowledge of Git
  4. The Project: Finish converting git submodule to builtin
  5. Prior work
  6. General implementation strategy
  7. Timeline (using the format dd/mm)
  8. Beyond GSoC
  9. Blogging
  10. Final Remarks: A little more about me

Personal Details

Name : Atharva Raykar
Major : Computer Science and Engineering
Email : raykar.ath@gmail.com
IRC nick : atharvaraykar on #git and #git-devel
Address : redacted
Postal Code : redacted
Time Zone : IST (UTC+5:30)
GitHub : github.com/tfidfwastaken

Background

I am Atharva Raykar, currently in my third year of studying Computer Science and Engineering at PES University, Bangalore. I have always enjoyed programming since a young age, but my deep appreciation for good program design and creating the right abstractions came during my exploration of the various rabbitholes of knowledge originating from communities around the internet. I have personally enjoyed learning about Functional Programming, Database Architecture and Operating Systems, and my interests keep expanding as I explore more in this field.

I owe my appreciation of this rich field to these communities, and I always wanted to give back. With that goal, I restarted the PES Open Source community in our campus, with the goal of creating spaces where members could share knowledge, much in the same spirit as the communities that kickstarted my journey in Computer Science. I learnt a lot about collaborating in the open, maintainership, and reviewing code. While I have made many small contributions to projects in the past, I am hoping GSoC will help me make the leap to a larger and more substantial contribution to one of my favourite projects that made it all possible in my journey with Open Source.

Me and Git

Here are the various forms of contributions that I have made to Git:

I intend to continue helping people out on the mailing list and IRC and tending to patches wherever possible in the meantime.

Current knowledge of Git

I use Git almost daily in some form, and I am fairly comfortable with it. I have already read and understood the chapters from the Git Book about submodules along with the one on objects, references, packfiles and the refspec.

The Project: Finish converting git submodule to builtin

Git has historically had many components implemented in the form of shell scripts. This was less than ideal for several reasons:

  • Portability: Non-POSIX systems like Windows don’t play nice with shell script commands like grep, cd and printf, to name a few, and these commands have to be reimplemented for the system. There are also POSIX to Windows path conversion issues.
  • No direct access to plumbing: Shell commands do not have direct access to the low level Git API, and a separate shell is spawned to just to carry out their operations.
  • Performance: Shell scripts tend to create a lot of child processes which slows down the functioning of these commands, especially with large repositories.

Over the years, many GSoC students have converted the shell versions of these commands to C. Git submodule is the last of these to be converted.

Prior work

I will be taking advantage of the knowledge that was gained in the process of the converting the previous scripts and avoiding all the gotchas that may be present in the process. There may be a bunch of useful helper functions in the previous patches that can be reused as well (more investigation needed to determine what exactly is reusable).

Currently the only other commands left to be completed for submodule are add and update. Work for add has already been started by a previous GSoCer, Shourya Shukla, and needs to picked up from there. update has had some of its functionality moved over to submodule--helper.c where Stefan Beller added the helper functions update-clone, update-module-mode, remote-branch and more.

References:
gitgitgadget/git#541 (comment)
https://github.com/git/git/commit/4d6d6ef1fc
https://github.com/git/git/commit/48308681b072a1d32e1361c255347324a8ad151e
https://github.com/git/git/commit/ee69b2a90c5031bffb3341c5e50653a6ecca89ac
https://github.com/git/git/commit/92bbe7ccf1fedac825f2c6ab4c8de91dc5370fd2

I’ll have these as my references when I am working on the project:
His blog about his progress:
https://shouryashukla.blogspot.com/2020/08/the-final-report.html (more has been implemented since)
Shourya’s latest patch for submodule add:
https://lore.kernel.org/git/20201007074538.25891-1-shouryashukla.oo@gmail.com/

For the most part, the implementation looks fairly complete, but there seems to be a segfault occurring, along with a few changes suggested by the reviewers. It will be helpful to contact Shourya to fully understand what needs to be done.

Prathamesh’s previous conversion work:
https://lore.kernel.org/git/20170724203454.13947-1-pc44800@gmail.com/#t

The ultimate goal would be to get rid of git-submodules.sh altogether – which will complete the porting efforts of submodule to C.

General implementation strategy

The way to port the shell to C code for submodule will largely remain the same. There already exists the builtin submodule--helper.c which contains most of the previous commands’ ports. All that the shell script for git-submodule.sh is doing for the previously completed ports is parsing the flags and then calling the helper, which does all the business logic.

So I will be moving out all the business logic that the shell script is performing to submodule--helper.c. Any reusable functionality that is introduced during the port will be added to submodule.c in the top level.

For example: The general strategy for converting cmd_update() would be to have an invocation of to submodule--helper update <flags> in the shell script which maps to a C function which I would create, named module_update(). This would perform the work being done by the shell script past the flags being parsed and make the necessary call to update_clone().
update_clone() takes care of cloning all the submodules and returns their SHA1, whether the module was just cloned, and the path to the submodule. For each cloned module, it uses the information in those entries to find out the update mode through module_update_mode(), and run the appropriate operation according to that mode (like a rebase, if that was the update mode). The SHA1 from update_clone() helps us determine whether we need to update the submodules to match what the superproject expects.

One possible way this work can be broken up into multiple patches is by moving over the shell code into C in a bottom-up manner.

For example: The shell part which retrieves the latest revision in the remote (if –remote is specified) can be wrapped into an invocation like git submodule--helper update-remote ${nofetch:+--nofetch} <sm_path>. This would return the remote name and SHA1 for the remote tracked by the submodule. Then we can move the part where we run the update method (ie the case on line 611 onwards) into a C function that is invoked by something that looks like git submodule--helper run-update-operation $update-module. This will run the update function, ie, either checkout, merge or rebase depending on the flags passed, or configuration setup by the end user. Eventually, the shell part will just look like a bunch of invocations to submodule--helper, at which point, the whole thing can be encapsulated in a single command called git submodule--helper update <flags> (Bonus: Move the whole functionality to C, including the parsing of flags, to work towards getting rid of git-submodule.sh). I believe this is a fairly non-destructive and incremental way to work, and the porting efforts by Stefan seem to follow this same kind of philosophy. I will most likely end up tuning the size of these increments when I get around to planning in my first phase of the project.

What I have mentioned above is just illustrating what my workflow might look like, and the details are subject to change as I will probably discover nicer ways to get to the end goal of moving everything to submodule--helper. What will remain unchanged though, is my high level workflow, which can be summarized to these four steps:

  1. Identify parts in git-submodule.sh that have cohesive functionality
  2. Rewrite that functionality in C, which can be invoked from `git submodule–helper `
  3. Remove the shell code and replace it with the above invocation. This could be sent as one patch, making it easier to review. Steps 1 to 3 are repeated until the shell code is reduced to a bunch of calls to submodule--helper
  4. Once the shell code is reduced to only a bunch of calls to submodule--helper, wrap all of that into one call that looks like git submodule--helper update <flags> that encapsulates all the functionality done by the other helper function calls.

After this process, I will be adding the add and update command to the commands array in submodule--helper.c. And since these two functions are the last bit of functionality left to convert in submodules, an extended goal can be to get rid of the shell script altogether, and make the helper into the actual builtin [1].

[1] https://lore.kernel.org/git/nycvar.QRO.7.76.6.2011191327320.56@tvgsbejvaqbjf.bet/

Timeline (using the format dd/mm)

Periods of limited availability (read: hectic chaos):

  • From 13/04 to 20/04 I will be having project evaluations and lab assessments for five of my courses.
  • From 20/04 to 01/05 I have my in-semester exams.
  • For a period of two weeks in the range of 08/05 to 29/05 I will be having my end-semester exams.

My commitment: I will still have time during my finals to help people out on the mailing list, get acquainted with the community and its processes, and even review patches if I can. This is because we get holidays between each exam, and my grades are good enough to that I can prioritise Git over my studies ;-)

And on the safe side, I will still engage with the community from now till 07/06 so that the community bonding period is not compromised in any way.

Periods of abundant availability: After 29/05 all the way to the first week of August, I will be having my summer break, so I can dedicate myself to Git full-time :-)

I would have also finished all my core courses, so even after that, I will have enough of time to give back to Git past my GSoC period.

Phase 1: 07/06 to 14/06 – Investigate and devise a strategy to port the submodule functions

  • This phase will be more diagrams in my notebook than code in my editor – I will go through all the methods used to port the other submodule functions and see how to do the same for what is left.
  • I will find the C equivalents of all the shell invocations in git-submodule.sh, and see what invocations have no equivalent and need to be created as helpers in C (Eg: What is the equivalent to the ensure-core-worktree invocation in C?). For all the helpers and new functionality that I do introduce, I will need to create the testing strategy for the same.
  • I will go through all the work done by Shourya in his patch, and try to understand it properly. I will also see the mistakes that were caught in all the reviews for previous submodule conversion patches and try to learn from them before I jump to the code.
  • Deliverable: I will create a checklist for all the work that needs to be done with as much detail as I can with the help of inputs from my mentor and all the knowledge I have gained in the process.

Phase 2: 14/06 to 28/06 – Convert add to builtin in C

  • I will work on completing git submodule add. One strategy would be to either reimplement the whole thing using what was learnt in Shourya’s attempt, but it is probably wiser to just take his patch and modify it. I would know what to do by the time I reach this phase.
  • I will also add tests for this functionality. I will also document my changes when required. These would be unit tests for the helpers introduced, and integration of add with the other commands.
  • Deliverable: Completely port add to C!

Phase 3: 28/06 to 16/08 – Convert update to builtin

  • Some work has already been done by Stephan Beller that moves the functionality of update to submodule--helper.c: https://github.com/git/git/commit/48308681b072a1d32e1361c255347324a8ad151e, but a lot of the business logic of going into the submodule and checking out or merging or rebasing needs to still be converted. Plenty to do here.
  • As with add, all of the appropriate tests need to be written and the changes documented. As I have learnt from the Pro Git Book, there are a lot of subtleties with how update does its work that I need to watch out for.
  • Deliverable: Completely port update to C!

Bonus Phase: If I am ahead of time – Remove the need for a submodule--helper, and make it a proper C builtin.

  • Once all the submodule functionality is ported, the shell script is not really doing much more than parsing the arguments and passing it to the helper. We won’t need this anymore if it is implemented.

Beyond GSoC

I love the process of working as a community more than anything else, and I already felt very welcomed by the Git community the moment I started sending in my microproject patch series. Whether I am selected or not, I will continue giving back to Git wherever I can. Since my final year is light on coursework, I will be able to mentor people and help expand the Git developer community through all the ways I can (be it code review, helping people find the right resources or evangelism of Git).

Blogging

I will be blogging about my progress on a weekly basis and either post it on my website at https://atharvaraykar.me (probably will tuck it away in a /gsoc path). Technical blogging is not particularly new to me, and I hope my posts can help future contributors of Git.

Final Remarks: A little more about me

These are some of my core values that I believe will be important to pull off this project and make the most of my time in GSoC:

  • Hard problems don’t frustrate me, rather they excite me. Bugs make my brain perk up. I love the process of learning.
  • I am pro-transparency. If I am having some trouble, I will be open about it. I don’t hesitate to ask questions and dig deep if I need to.
  • At the same time, when I ask a question, I only do so after I have struggled with the problem for enough time and done my due diligence in trying to solve it. Clear communication is very important to make this work.
  • I am also very comfortable with learning things all on my own (I have barely known any other way), and working in a remote, asynchronous setting.

I hope to make the world better in my own small way by contributing to a tool that everyone uses and I like. It’s more rewarding than any internship that my peers are doing this year. I look forward to learning more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment