The greatest strength of inlang is that adoption is nearly frictionless. By integrating with existing tools, creating pull requests ourselves, and by providing simple graphical tools, we reduce the the number of obstacles users are faced with when adopting our software.
Consider the analogy of a rocket ship. If we want the ship to fly further, we can either increase thrust, or reduce drag. "Increasing thrust" can be thought of as adding new features, marketing, community outreach, etc. On the other hand, "reducing drag" can be thought of in terms of UI/UX/DX, i.e those things that make inlang easier to use and adopt.
Now consider the most extreme case of our rocket ship analogy: The rocket has a completely flat nose. In this case, we can increase the thrust all we want, but the effects are essentially nullified by the shape of the nose. The important thing to note is that this principle always remains in effect even as we make our rocket more aerodynamic. That is, we will never be able see the full benefits of adding thrust until we eliminate drag.
As previously stated, I am of the opinion that in its current state, the inlang software suite is extremely "aerodynamic" as is. I believe that this is the reason we are able to see such significant adoption despite the product being in its early days of development. Furthermore, I believe that it is by focusing on this aspect -- on the reduction of "drag" rather than the increase of "thrust" -- that we will see the largest improvement in the quality of our product. This will be the guiding philosophy of inlang-git. The application of this philosophy throughout the rest of inlang's software suite is up to the discretion of the rest of the team, of course.
The purpose of inlang-git is simple. The editor is built on Git. By making a better Git, we can make a better editor. By making a more accessible Git, we can make a more accessible editor.
There are two classes of limitations which hold back the improvement of UX in the editor:
- The limitations in isomorphic-git
- The limitations in the established Git flow
Item (1) has been addressed previously in the git-sdk proposal. The main issues have not changed, nor should our approach to them. This document mainly concerns itself with item (2), the issues inherent within the established Git flow. In addressing this, I hope to round out the ideas initially expressed in my Git-sdk proposal, and provide a more concrete image of what "the next Git" could look like.
Before we begin in outlining the functioning of "the next Git", I think it is important to consider how we can better use the Git features we have available to us at the moment. One salient aspect is in the editor workflow, which I believe can be made faster and more efficient with some small changes.
The current editor workflow is as follows:
- Clone the repository
- Fork the repository
- Edit a string
- Commit your edit
- Edit a string
- Commit your edit
- Edit a string
- etc... n. Push your changes
It is clear that this is based on the standard Git workflow of clone, add, commit, push. There is one significant issue with this workflow in that it does not scale to large repos. With potentially hundreds of strings to update, clicking "commit" each time we make an edit becomes extremely tedious.
Furthermore, the implementation of this flow in the editor is inefficient. Every time a user clicks on the "commit" button we:
- Run
statusMatrix
to see which files changed (very slow) - Filter the
statusMatrix
output to create a list of only the files we are interested in - For each file in this list, run
git.add
on the file - Commit the changes (slow)
This is already inefficient, since the "commit" button only updates a single
line in a single file each time it is clicked, but is made much worse by the
fact that statusMatrix
is extremely slow (only in the browser, for
reasons I have not yet clearly identified). If a repository is large enough,
this leads to multi-second long wait times each time a file is committed to
the repository.
The solution to this problem would be to delay committing changes until
the user is ready to push them. Then we run git.add
on the entire repository,
and commit+push as one operation. For the user, the flow then becomes:
- Clone the repository
- Fork the repository
- Edit multiple strings
- Push your changes
There are two main benefits to this. First, there is no need to invoke
statusMatrix
, which causes a massive slowdown in committing changes to large
repos. Second, the flow is simplified in that a user no longer has to click
"commit" each time a change is added.
Thus in one change, we improve both performance and UX in the editor. Returning to the rocket analogy, we have reduced the "drag" of the editor in two aspects.
This section serves to restate ideas previously outlined in the git-sdk proposal. As a quick recap, the planned features are:
- Lazy loading
- Object-level authentication
- Improved support for binary files
These items have not changed significantly since the introduction of the git-sdk. The following section serves to outline a new flow, i.e. a new paradigm for interacting with Git, and one should assume these features to be already included.
Fundamentally, Git is a good piece of software with a terrible UI. This has been addressed time and time again. There exist solutions such as gitless, legit, and easygit which all seek to solve this issue. The common theme among projects of this sort is that they all introduce new (supposedly simpler) concepts and functionality to replace the existing concepts and functionality of Git. The problem with this is the switching cost involved. While these flows may indeed be easier for users, the cost of learning a new interface instead of Git simply seems to be "not worth it" in the long run.
So what we need is a tool that is simpler than Git, but not alienating to existing users. How do we accomplish this? The answer is to take a page out of the Plan 9 playbook:
"Everything is a [branch], and we have a lot of tools to work with [branches]."
"Everything is a branch" is the fundamental design philosophy behind this new git flow. Thus, the concepts of "workdir", "index", "stash", and "remote" disappear completely.
Remote branches are just regular branches which are kept in sync with some Git remote. Thus, we no longer "push" changes, instead we commit our changes to the remote branch. Furthermore, we no longer "stage" files at all, and the "worktree" is now simply a branch which is synced to the filesystem. Thus, an "undo" operation is as simple as stepping back in the branch history.
To better illustrate this new flow, we describe a hypothetical tool iv
(for
inlang versioner). Consider the typical "Hello world" example,
commonly used in setting up Git repositories. First, the traditional flow:
git init
git remote add origin https://github.com/user/repo
echo "Hello world!" > file.txt
git add file.txt
git commit -m "First Commit"
git push -u origin main
and the same process using iv
:
iv init
iv branch main https://github.com/user/repo
echo "Hello world!" > file.txt
iv commit -m "First commit"
The benefit of the new flow is not in the 2 fewer commands. Rather it is the reduction in complexity of one's mental model of the Git repo. In the Git example above we are:
- Initializing Git
- Implicitly creating a new branch called "main"
- Creating a remote with the name "origin"
- Creating a file in the worktree
- Adding the same file to the index
- Committing the file to our branch called "main"
- Pushing our branch called "main" to our remote called "origin"
- Adding an upstream tracking reference for our remote called "origin"
With iv
, the mental model is considerably simpler. In this case we are:
- Initializing iv
- Creating a new branch called "main" which is synced with a remote branch
- Creating a file
- Committing the file to our branch called "main"
The importance is in the new concepts introduced, highlighted in italics. With
Git there are 7 concepts to understand in order to execute a basic version
control operation. With iv
there are 2.
The hypothetical iv
tool is described in further detail here. What is
important to note is that all of this can be built on top of existing low level
Git functionality. An iv
-like tool could be built using isomorphic-git,
libgit2, or even Git itself (as a set of shell scripts).
Along with the previously mentioned object level authentication, a new server can also provide performance improvements for our "constant sync" workflow.
The Git server protocol is based on a simple principle: the client requests objects from the server, and the server provides them in Packfile format. For the existing Git flow, this is perfectly adequate. For a new flow based around keeping remote branches synchronized, the constant need to query for new refs quickly becomes a performance issue.
The Git pack protocols, both v1 and v2, are fundamentally designed around the concept of infrequent updates to widely divergent object stores. The solution here is to create a streaming Git server, i.e one in which clients can subscribe to updates about branches they are interested in.
When a client successfully updates a remote ref, the new value of the ref, as well as any relevant objects should be sent to all subscribed clients. For a lazy-loaded branch, this would be the location of HEAD and the tree object it points to. For a fully checked-out branch, this would be the location of HEAD, the tree object it points to, and any new objects the tree contains.
Now, let us collect all of the previously stated points into a concrete description of the next Git. It consists of two aspects: a client and a server.
The client implements the previously described "new flow" to increase accessibility. It lazily fetches files as they are needed to make it easier to work with large repositories. It is file-agnostic. Through various diff providers it is capable of logically representing diffs between binary file types. These diff providers also reduce the occurrence of merge conflicts through semantic diffing. Merge conflicts are further kept to a minimum through constant sync with remote branches.
The server works with the client to implement the new flow. It implements file-level authentication for added security when working with large shared repositories.
All of these features should combine to make both the editor, and Git, more accessible to developers and non-developers alike. To return to our analogy, the goal of this project is in reducing "drag" not only in the editor, but in Git itself. Thus, priority will be placed on features which enable us to simplify the Git workflow, or to support a previously unsupported common use case.
The roadmap then, is as follows:
- Implement protocol v2 into isomorphic-git
- Use protocol v2 to implement sparse checkout into isomorphic-git
- Use sparse checkout to improve performance in the editor
- If performance is still inadequate, look towards improving the performance of remote packfile indexing (specifically delta resolution) in isomorphic-git.
- If performance is still inadequate, look towards improving the algorithm for
checkout
in isomorphic-git. Specifically, the algorithm must run in a single pass while still handling file dependencies appropriately. - Create a proof of concept, evaluative, internal-use tool implementing the "new Git flow" outlined above.
After the tool is created its development will have it's own roadmap:
- Implement partial cloning into isomorphic-git
- Use partial cloning to implement a lazy loading filesystem
- Experiment with semantic diff providers for both binary and text files
- Create the new Git server with ref streaming and file-level authentication
- Use what we learn with this tool to create the true "next Git"
- Release it to the world
- Wait
Besides the obvious additions, one notable change is that the lazy filesystem
has been moved much further down the roadmap, and has been replaced with sparse
checkout. The reasoning for this is simply that GitHub does not support the
filter
option to git fetch-pack
, and thus lazy cloning from a GitHub repo
would not be possible. Sparse checkout should provide similar performance
improvements while remaining more widely supported both by Git hosts and Git
itself.
Summary
First things first, your rocket analogy seems spot on. The main advantage of our git-based architecture appears to be lower friction to adopt our, but it could be any software:
We take existing data (files) that can be opened with apps. This sounds low-tech like 1990, but that seems to be a great thing. No (dedicated) servers, database, auth system, sync, integrations, and whatnot are required to get started.
Different teams (devs and translators) can collaborate in their respective tools on a "single source of truth", namely files. The barrier to cross-collaborate on files seems simpler than alternatives, likely due to the simplicity of 1.
Discussion points
I created issues for each discussion point to streamline the discussion and being able to reference the discussions. All issues are tracked in a PR which is supposed to lead to a design document.
Your proposal is a concrete "everything is a branch" suggestion. While this sounds good, we have a superior ingredient to any other git improvement over the previous decade: The inlang editor is targeting (mostly) non-developers. We are free to experiment with a concrete use case that entails requirements.
Should the server be a superset of the client instead of being a separate application opral/monorepo#822?
Should we drop direct support for "legacy" git hosts and instead build a proxy feature into inlang-git/server opral/monorepo#823?
Should we rename inlang-git to a brandable name opral/monorepo#824?
(5. The tech stack for the server) opral/monorepo#827
Replies that don't seem to require a dedicated issue
Agree on the aerodynamic part, disagree on "significant adoption".
We seem to be able to acquire "low incentive to adopt localization tooling" projects through our git-based architecture. That is a good sign given that higher adoption friction (of other architectures) would likely lead to no adoption. But, those projects are not yielding recurring usage yet. The bigger test is to acquire the top 1% of GitHub projects and private companies that have a high(er) incentive to localize. Will they adopt inlang because of the git-based architecture? We are blocked on that question because inlang-git needs lazy cloning of repos first.
On our design approach
Yep, focusing on drag rather than thrust is what we should and all seem to follow. I have to plug a quote from @jannesblobel:
On the performance of statusMatrix
Regardless of the git flow implementation in an app, the underlying inlang-git implementation should be fast.
Calling
statusMatrix
only on files that change should speed things up by a wide margin. However, the filesystem implementation needs to "communicate" to inlang-git what files changed. Even better maybe,statusMatrix
works as a stream under the hood and automatically knows what changed and what not without traversing the entire filesystem (basically makingfilter
obsolete).PS Changing the flow to eliminate message by message commits has been discussed several times and will eventually be implemented.