Skip to content

Instantly share code, notes, and snippets.

@araknast
Last active May 23, 2023 08:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save araknast/d80f8b64b2777e42b49fd9e89e5a1b13 to your computer and use it in GitHub Desktop.
Save araknast/d80f8b64b2777e42b49fd9e89e5a1b13 to your computer and use it in GitHub Desktop.

Git Vision 2023

Introduction

The greatest strength of inlang is that adoption is nearly frictionless. By integrating with existing tools, creating pull requests ourselves, and by providing simple graphical tools, we reduce the the number of obstacles users are faced with when adopting our software.

Consider the analogy of a rocket ship. If we want the ship to fly further, we can either increase thrust, or reduce drag. "Increasing thrust" can be thought of as adding new features, marketing, community outreach, etc. On the other hand, "reducing drag" can be thought of in terms of UI/UX/DX, i.e those things that make inlang easier to use and adopt.

Now consider the most extreme case of our rocket ship analogy: The rocket has a completely flat nose. In this case, we can increase the thrust all we want, but the effects are essentially nullified by the shape of the nose. The important thing to note is that this principle always remains in effect even as we make our rocket more aerodynamic. That is, we will never be able see the full benefits of adding thrust until we eliminate drag.

As previously stated, I am of the opinion that in its current state, the inlang software suite is extremely "aerodynamic" as is. I believe that this is the reason we are able to see such significant adoption despite the product being in its early days of development. Furthermore, I believe that it is by focusing on this aspect -- on the reduction of "drag" rather than the increase of "thrust" -- that we will see the largest improvement in the quality of our product. This will be the guiding philosophy of inlang-git. The application of this philosophy throughout the rest of inlang's software suite is up to the discretion of the rest of the team, of course.

The Purpose of inlang-git

The purpose of inlang-git is simple. The editor is built on Git. By making a better Git, we can make a better editor. By making a more accessible Git, we can make a more accessible editor.

There are two classes of limitations which hold back the improvement of UX in the editor:

  1. The limitations in isomorphic-git
  2. The limitations in the established Git flow

Item (1) has been addressed previously in the git-sdk proposal. The main issues have not changed, nor should our approach to them. This document mainly concerns itself with item (2), the issues inherent within the established Git flow. In addressing this, I hope to round out the ideas initially expressed in my Git-sdk proposal, and provide a more concrete image of what "the next Git" could look like.

Using Git More Effectively in the Editor

Before we begin in outlining the functioning of "the next Git", I think it is important to consider how we can better use the Git features we have available to us at the moment. One salient aspect is in the editor workflow, which I believe can be made faster and more efficient with some small changes.

The current editor workflow is as follows:

  1. Clone the repository
  2. Fork the repository
  3. Edit a string
  4. Commit your edit
  5. Edit a string
  6. Commit your edit
  7. Edit a string
  8. etc... n. Push your changes

It is clear that this is based on the standard Git workflow of clone, add, commit, push. There is one significant issue with this workflow in that it does not scale to large repos. With potentially hundreds of strings to update, clicking "commit" each time we make an edit becomes extremely tedious.

Furthermore, the implementation of this flow in the editor is inefficient. Every time a user clicks on the "commit" button we:

  • Run statusMatrix to see which files changed (very slow)
  • Filter the statusMatrix output to create a list of only the files we are interested in
  • For each file in this list, run git.add on the file
  • Commit the changes (slow)

This is already inefficient, since the "commit" button only updates a single line in a single file each time it is clicked, but is made much worse by the fact that statusMatrix is extremely slow (only in the browser, for reasons I have not yet clearly identified). If a repository is large enough, this leads to multi-second long wait times each time a file is committed to the repository.

The solution to this problem would be to delay committing changes until the user is ready to push them. Then we run git.add on the entire repository, and commit+push as one operation. For the user, the flow then becomes:

  1. Clone the repository
  2. Fork the repository
  3. Edit multiple strings
  4. Push your changes

There are two main benefits to this. First, there is no need to invoke statusMatrix, which causes a massive slowdown in committing changes to large repos. Second, the flow is simplified in that a user no longer has to click "commit" each time a change is added.

Thus in one change, we improve both performance and UX in the editor. Returning to the rocket analogy, we have reduced the "drag" of the editor in two aspects.

A New Sdk - Recap

This section serves to restate ideas previously outlined in the git-sdk proposal. As a quick recap, the planned features are:

  • Lazy loading
  • Object-level authentication
  • Improved support for binary files

These items have not changed significantly since the introduction of the git-sdk. The following section serves to outline a new flow, i.e. a new paradigm for interacting with Git, and one should assume these features to be already included.

A New Flow

Motivation

Fundamentally, Git is a good piece of software with a terrible UI. This has been addressed time and time again. There exist solutions such as gitless, legit, and easygit which all seek to solve this issue. The common theme among projects of this sort is that they all introduce new (supposedly simpler) concepts and functionality to replace the existing concepts and functionality of Git. The problem with this is the switching cost involved. While these flows may indeed be easier for users, the cost of learning a new interface instead of Git simply seems to be "not worth it" in the long run.

The Solution

So what we need is a tool that is simpler than Git, but not alienating to existing users. How do we accomplish this? The answer is to take a page out of the Plan 9 playbook:

"Everything is a [branch], and we have a lot of tools to work with [branches]."

"Everything is a branch" is the fundamental design philosophy behind this new git flow. Thus, the concepts of "workdir", "index", "stash", and "remote" disappear completely.

Remote branches are just regular branches which are kept in sync with some Git remote. Thus, we no longer "push" changes, instead we commit our changes to the remote branch. Furthermore, we no longer "stage" files at all, and the "worktree" is now simply a branch which is synced to the filesystem. Thus, an "undo" operation is as simple as stepping back in the branch history.

To better illustrate this new flow, we describe a hypothetical tool iv (for inlang versioner). Consider the typical "Hello world" example, commonly used in setting up Git repositories. First, the traditional flow:

git init
git remote add origin https://github.com/user/repo
echo "Hello world!" > file.txt
git add file.txt
git commit -m "First Commit"
git push -u origin main

and the same process using iv:

iv init
iv branch main https://github.com/user/repo
echo "Hello world!" > file.txt
iv commit -m "First commit"

The benefit of the new flow is not in the 2 fewer commands. Rather it is the reduction in complexity of one's mental model of the Git repo. In the Git example above we are:

  1. Initializing Git
  2. Implicitly creating a new branch called "main"
  3. Creating a remote with the name "origin"
  4. Creating a file in the worktree
  5. Adding the same file to the index
  6. Committing the file to our branch called "main"
  7. Pushing our branch called "main" to our remote called "origin"
  8. Adding an upstream tracking reference for our remote called "origin"

With iv, the mental model is considerably simpler. In this case we are:

  1. Initializing iv
  2. Creating a new branch called "main" which is synced with a remote branch
  3. Creating a file
  4. Committing the file to our branch called "main"

The importance is in the new concepts introduced, highlighted in italics. With Git there are 7 concepts to understand in order to execute a basic version control operation. With iv there are 2.

The hypothetical iv tool is described in further detail here. What is important to note is that all of this can be built on top of existing low level Git functionality. An iv-like tool could be built using isomorphic-git, libgit2, or even Git itself (as a set of shell scripts).

A New Server

Along with the previously mentioned object level authentication, a new server can also provide performance improvements for our "constant sync" workflow.

The Git server protocol is based on a simple principle: the client requests objects from the server, and the server provides them in Packfile format. For the existing Git flow, this is perfectly adequate. For a new flow based around keeping remote branches synchronized, the constant need to query for new refs quickly becomes a performance issue.

The Git pack protocols, both v1 and v2, are fundamentally designed around the concept of infrequent updates to widely divergent object stores. The solution here is to create a streaming Git server, i.e one in which clients can subscribe to updates about branches they are interested in.

When a client successfully updates a remote ref, the new value of the ref, as well as any relevant objects should be sent to all subscribed clients. For a lazy-loaded branch, this would be the location of HEAD and the tree object it points to. For a fully checked-out branch, this would be the location of HEAD, the tree object it points to, and any new objects the tree contains.

Conclusion

Now, let us collect all of the previously stated points into a concrete description of the next Git. It consists of two aspects: a client and a server.

The client implements the previously described "new flow" to increase accessibility. It lazily fetches files as they are needed to make it easier to work with large repositories. It is file-agnostic. Through various diff providers it is capable of logically representing diffs between binary file types. These diff providers also reduce the occurrence of merge conflicts through semantic diffing. Merge conflicts are further kept to a minimum through constant sync with remote branches.

The server works with the client to implement the new flow. It implements file-level authentication for added security when working with large shared repositories.

Roadmap

All of these features should combine to make both the editor, and Git, more accessible to developers and non-developers alike. To return to our analogy, the goal of this project is in reducing "drag" not only in the editor, but in Git itself. Thus, priority will be placed on features which enable us to simplify the Git workflow, or to support a previously unsupported common use case.

The roadmap then, is as follows:

  1. Implement protocol v2 into isomorphic-git
  2. Use protocol v2 to implement sparse checkout into isomorphic-git
  3. Use sparse checkout to improve performance in the editor
  4. If performance is still inadequate, look towards improving the performance of remote packfile indexing (specifically delta resolution) in isomorphic-git.
  5. If performance is still inadequate, look towards improving the algorithm for checkout in isomorphic-git. Specifically, the algorithm must run in a single pass while still handling file dependencies appropriately.
  6. Create a proof of concept, evaluative, internal-use tool implementing the "new Git flow" outlined above.

After the tool is created its development will have it's own roadmap:

  1. Implement partial cloning into isomorphic-git
  2. Use partial cloning to implement a lazy loading filesystem
  3. Experiment with semantic diff providers for both binary and text files
  4. Create the new Git server with ref streaming and file-level authentication
  5. Use what we learn with this tool to create the true "next Git"
  6. Release it to the world
  7. Wait

Besides the obvious additions, one notable change is that the lazy filesystem has been moved much further down the roadmap, and has been replaced with sparse checkout. The reasoning for this is simply that GitHub does not support the filter option to git fetch-pack, and thus lazy cloning from a GitHub repo would not be possible. Sparse checkout should provide similar performance improvements while remaining more widely supported both by Git hosts and Git itself.

@samuelstroschein
Copy link

Summary

First things first, your rocket analogy seems spot on. The main advantage of our git-based architecture appears to be lower friction to adopt our, but it could be any software:

  1. We take existing data (files) that can be opened with apps. This sounds low-tech like 1990, but that seems to be a great thing. No (dedicated) servers, database, auth system, sync, integrations, and whatnot are required to get started.

  2. Different teams (devs and translators) can collaborate in their respective tools on a "single source of truth", namely files. The barrier to cross-collaborate on files seems simpler than alternatives, likely due to the simplicity of 1.

Discussion points

I created issues for each discussion point to streamline the discussion and being able to reference the discussions. All issues are tracked in a PR which is supposed to lead to a design document.

  1. How to "reach" the next git opral/monorepo#825?

Your proposal is a concrete "everything is a branch" suggestion. While this sounds good, we have a superior ingredient to any other git improvement over the previous decade: The inlang editor is targeting (mostly) non-developers. We are free to experiment with a concrete use case that entails requirements.

  1. Should the server be a superset of the client instead of being a separate application opral/monorepo#822?

  2. Should we drop direct support for "legacy" git hosts and instead build a proxy feature into inlang-git/server opral/monorepo#823?

  3. Should we rename inlang-git to a brandable name opral/monorepo#824?

(5. The tech stack for the server) opral/monorepo#827


Replies that don't seem to require a dedicated issue

As previously stated, I am of the opinion that in its current state, the inlang software suite is extremely "aerodynamic" as is. I believe that this is the reason we are able to see such significant adoption despite the product being in its early days of development.

Agree on the aerodynamic part, disagree on "significant adoption". 

We seem to be able to acquire "low incentive to adopt localization tooling" projects through our git-based architecture. That is a good sign given that higher adoption friction (of other architectures) would likely lead to no adoption. But, those projects are not yielding recurring usage yet. The bigger test is to acquire the top 1% of GitHub projects and private companies that have a high(er) incentive to localize. Will they adopt inlang because of the git-based architecture? We are blocked on that question because inlang-git needs lazy cloning of repos first.

On our design approach

Furthermore, I believe that it is by focusing on this aspect -- on the reduction of "drag" rather than the increase of "thrust" -- that we will see the largest improvement in the quality of our product. This will be the guiding philosophy of inlang-git.

Yep, focusing on drag rather than thrust is what we should and all seem to follow. I have to plug a quote from @jannesblobel:

German original: "Ich glaube um X [entfernt] muss man sich keine sorgen machen, es wirkt so, als würden die wie ein kopfloses huhn rumlaufen und einfach nur alles raushauen was geht anstatt zu überlegen, wie man wirklich das alles einfacher macht"

English translation: "I don't think we need to worry about X [redacted]. It seems like they are running around like a headless chick and push out everything you can think of instead of thinking on how to make things easier"

On the performance of statusMatrix

[Referring to the current git commit flow] This is already inefficient, since the "commit" button only updates a single line in a single file each time it is clicked, but is made much worse by the fact that statusMatrix is extremely slow

Regardless of the git flow implementation in an app, the underlying inlang-git implementation should be fast.

Calling statusMatrix only on files that change should speed things up by a wide margin. However, the filesystem implementation needs to "communicate" to inlang-git what files changed. Even better maybe, statusMatrix works as a stream under the hood and automatically knows what changed and what not without traversing the entire filesystem (basically making filter obsolete).

PS Changing the flow to eliminate message by message commits has been discussed several times and will eventually be implemented.

@NilsJacobsen
Copy link

NilsJacobsen commented May 23, 2023

Great Vision '23 👏.

Generell said, I think you are definitely on the right path. It was great to read that you are already responding to the use cases like git flows and subscription that Niklas and I suggested for the editor vision.

A love to read through the analogy. I think it is pretty accurate. I have to admit that @samuelstroschein was write:

those projects are not yielding recurring usage yet

But I think that is also pushing the fact that everybody hates i18n and it is still a huge pain. I think it was great to see that if people looking for a solution while they building a new tool, are more that thankful for such an easy to use/setup tooling. I guess by working on multiple trigger strategies like automation/release of ide extension, our adoption will get even better.

The Solution

The solution sounds great. Everything as a branch takes out a lot of complexity. I think we have to make sure to not lead to confusion when git users a facing our new flow. This can be solved by wording I guess. Usually people are not concerned about pushing something because most of the time in open source projects that is pushed on your fork. But in our case that would mean that people are interacting with the common remote so that needs to be communicated correctly.

A New Server

This was the biggest unknown spot for me. I knew that we need to manage subscription, maybe faster syncs or even real time and lazy loading. I also knew that the GitHub servers are not really build for that so I was also trying to find a solution for that in my mind. I also came across the server idea. So if we can architect the server we can also make this requirements happen. But the downside of it would be that we need engineering, maintenance and also user management for that server. This would lead to a small threshold. But ultimately this can be the first step of being apply to make the new git profitable somehow. So maybe that is the right way.

Now I red about Samuel's idea of having the server living distributed on the clients. Actually it would be great to have something like that. I mean device power is getting better and better so that shouldn't be a problem in the future I guess. But I'm not sure how to make that reliable and a fit to our requirements. Maybe @samuelstroschein you can elaborate a bit more on that.

@NiklasBuchfink
Copy link

Very well written 👏

I feel that this proposal contains the updates to bring today's requirements for building software into a Git architecture. The current flow is hard to learn and always confuses developers. The simplified mental model will fix this and make it easier to teach to non-technical people and hide it behind the frontend of applications.

The idea of 'everything is a branch' and we just stream a remote branch to the client and sync the changes back sounds fantastic. It will lead to a great UX / DX with far less conflicts that need additional extension. On the other hand, there is still the possibility to implement the classic asynchronous workflow, which is suitable to bundle, compare and merge changes according to the appropriate criteria (review, test, role of a user, etc.). These workflows can be applied to all conceivable software use cases and seem easy to understand.

@samuelstroschein
Copy link

Let's close the discussion here and use opral/monorepo#820.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment