araknast/Git Vision '23.md Secret

## Git Vision '23.md

      
    Raw
  

              Git Vision '23.md
            
          
    Git Vision 2023

Introduction

The greatest strength of inlang is that adoption is nearly frictionless. By
integrating with existing tools, creating pull requests ourselves, and by
providing simple graphical tools, we reduce the the number of obstacles users
are faced with when adopting our software.
Consider the analogy of a rocket ship. If we want the ship to fly further, we
can either increase thrust, or reduce drag. "Increasing thrust" can be thought
of as adding new features, marketing, community outreach, etc. On the other
hand, "reducing drag" can be thought of in terms of UI/UX/DX, i.e those things
that make inlang easier to use and adopt.
Now consider the most extreme case of our rocket ship analogy: The rocket has a
completely flat nose. In this case, we can increase the thrust all we want, but
the effects are essentially nullified by the shape of the nose. The important
thing to note is that this principle always remains in effect even as we make
our rocket more aerodynamic. That is, we will never be able see the full
benefits of adding thrust until we eliminate drag.
As previously stated, I am of the opinion that in its current state, the inlang
software suite is extremely "aerodynamic" as is. I believe that this is the
reason we are able to see such significant adoption despite the product being
in its early days of development. Furthermore, I believe that it is by focusing
on this aspect -- on the reduction of "drag" rather than the increase of
"thrust" -- that we will see the largest improvement in the quality of our
product. This will be the guiding philosophy of inlang-git. The application of
this philosophy throughout the rest of inlang's software suite is up to the
discretion of the rest of the team, of course.
The Purpose of inlang-git

The purpose of inlang-git is simple. The editor is built on Git. By making a
better Git, we can make a better editor. By making a more accessible Git, we
can make a more accessible editor.
There are two classes of limitations which hold back the improvement of UX in
the editor:

The limitations in isomorphic-git
The limitations in the established Git flow

Item (1) has been addressed previously in the git-sdk
proposal.
The main issues have not changed, nor should our approach to them. This
document mainly concerns itself with item (2), the issues inherent within the
established Git flow. In addressing this, I hope to round out the ideas
initially expressed in my Git-sdk proposal, and provide a more concrete image of
what "the next Git" could look like.
Using Git More Effectively in the Editor

Before we begin in outlining the functioning of "the next Git", I think it is
important to consider how we can better use the Git features we have available
to us at the moment. One salient aspect is in the editor workflow, which I
believe can be made faster and more efficient with some small changes.
The current editor workflow is as follows:

Clone the repository
Fork the repository
Edit a string
Commit your edit
Edit a string
Commit your edit
Edit a string
etc...
n. Push your changes

It is clear that this is based on the standard Git workflow of clone, add,
commit, push. There is one significant issue with this workflow in that it does
not scale to large repos. With potentially hundreds of strings to update,
clicking "commit" each time we make an edit becomes extremely tedious.
Furthermore, the implementation of this flow in the editor is inefficient. Every
time a user clicks on the "commit" button we:

Run statusMatrix to see which files changed (very slow)
Filter the statusMatrix output to create a list of only the files we are
interested in
For each file in this list, run git.add on the file
Commit the changes (slow)

This is already inefficient, since the "commit" button only updates a single
line in a single file each time it is clicked, but is made much worse by the
fact that statusMatrix is extremely slow (only in the browser, for
reasons I have not yet clearly identified). If a repository is large enough,
this leads to multi-second long wait times each time a file is committed to
the repository.
The solution to this problem would be to delay committing changes until
the user is ready to push them. Then we run git.add on the entire repository,
and commit+push as one operation. For the user, the flow then becomes:

Clone the repository
Fork the repository
Edit multiple strings
Push your changes

There are two main benefits to this. First, there is no need to invoke
statusMatrix, which causes a massive slowdown in committing changes to large
repos. Second, the flow is simplified in that a user no longer has to click
"commit" each time a change is added.
Thus in one change, we improve both performance and UX in the editor. Returning
to the rocket analogy, we have reduced the "drag" of the editor in two
aspects.
A New Sdk - Recap

This section serves to restate ideas previously outlined in the git-sdk
proposal.
As a quick recap, the planned features are:

Lazy loading
Object-level authentication
Improved support for binary files

These items have not changed significantly since the introduction of the
git-sdk. The following section serves to outline a new flow, i.e. a new
paradigm for interacting with Git, and one should assume these features to be
already included.
A New Flow

Motivation

Fundamentally, Git is a good piece of software with a terrible UI. This has been
addressed time and
time again.
There exist solutions such as gitless,
legit, and
easygit which all seek to solve this
issue. The common theme among projects of this sort is that they all
introduce new (supposedly simpler) concepts and functionality to replace the
existing concepts and functionality of Git. The problem with this is the
switching cost
involved. While these flows may indeed be easier for users, the cost
of learning a new interface instead of Git simply seems to be "not worth it"
in the long run.
The Solution

So what we need is a tool that is simpler than Git, but not alienating to
existing users. How do we accomplish this? The answer is to take a page out of
the Plan 9 playbook:

"Everything is a [branch], and we have a lot of tools to work with [branches]."

"Everything is a branch" is the fundamental design philosophy behind this new
git flow. Thus, the concepts of "workdir", "index", "stash", and "remote"
disappear completely.
Remote branches are just regular branches which are kept in sync with some Git
remote. Thus, we no longer "push" changes, instead we commit our changes to the
remote branch. Furthermore, we no longer "stage" files at all, and the
"worktree" is now simply a branch which is synced to the filesystem. Thus, an
"undo" operation is as simple as stepping back in the branch history.
To better illustrate this new flow, we describe a hypothetical tool iv (for
inlang versioner). Consider the typical "Hello world" example,
commonly used in setting up Git repositories. First, the traditional flow:
git init
git remote add origin https://github.com/user/repo
echo "Hello world!" > file.txt
git add file.txt
git commit -m "First Commit"
git push -u origin main

and the same process using iv:
iv init
iv branch main https://github.com/user/repo
echo "Hello world!" > file.txt
iv commit -m "First commit"

The benefit of the new flow is not in the 2 fewer commands. Rather it is the
reduction in complexity of one's mental model of the Git repo. In the Git
example above we are:

Initializing Git
Implicitly creating a new branch called "main"
Creating a remote with the name "origin"
Creating a file in the worktree
Adding the same file to the index
Committing the file to our branch called "main"
Pushing our branch called "main" to our remote called "origin"
Adding an upstream tracking reference for our remote called "origin"

With iv, the mental model is considerably simpler. In this case we are:

Initializing iv
Creating a new branch called "main" which is synced with a remote branch
Creating a file
Committing the file to our branch called "main"

The importance is in the new concepts introduced, highlighted in italics. With
Git there are 7 concepts to understand in order to execute a basic version
control operation. With iv there are 2.
The hypothetical iv tool is described in further detail here. What is
important to note is that all of this can be built on top of existing low level
Git functionality. An iv-like tool could be built using isomorphic-git,
libgit2, or even Git itself (as a set of shell scripts).
A New Server

Along with the previously mentioned object level authentication, a new server
can also provide performance improvements for our "constant sync" workflow.
The Git server protocol is based on a simple principle: the client requests
objects from the server, and the server provides them in Packfile
format. For the existing Git flow, this
is perfectly adequate. For a new flow based around keeping remote branches
synchronized, the constant need to query for new refs quickly becomes a
performance issue.
The Git pack protocols, both v1 and v2, are fundamentally designed around the
concept of infrequent updates to widely divergent object stores. The solution
here is to create a streaming Git server, i.e one in which clients can
subscribe to updates about branches they are interested in.
When a client successfully updates a remote ref, the new value of the ref, as
well as any relevant objects should be sent to all subscribed clients. For a
lazy-loaded branch, this would be the location of HEAD and the tree object it
points to. For a fully checked-out branch, this would be the location of HEAD,
the tree object it points to, and any new objects the tree contains.
Conclusion

Now, let us collect all of the previously stated points into a concrete
description of the next Git. It consists of two aspects: a client and a server.
The client implements the previously described "new flow" to increase
accessibility. It lazily fetches files as they are needed to make it easier to
work with large repositories. It is file-agnostic. Through various diff
providers it is capable of logically representing diffs between binary file
types. These diff providers also reduce the occurrence of merge conflicts
through semantic diffing. Merge conflicts are further kept to a minimum through
constant sync with remote branches.
The server works with the client to implement the new flow. It implements
file-level authentication for added security when working with large shared
repositories.
Roadmap

All of these features should combine to make both the editor, and Git, more
accessible to developers and non-developers alike. To return to our analogy, the
goal of this project is in reducing "drag" not only in the editor, but in Git
itself. Thus, priority will be placed on features which enable us to simplify
the Git workflow, or to support a previously unsupported common use case.
The roadmap then, is as follows:

Implement protocol v2 into isomorphic-git
Use protocol v2 to implement sparse checkout into isomorphic-git
Use sparse checkout to improve performance in the editor
If performance is still inadequate, look towards improving the performance of
remote packfile indexing (specifically delta resolution) in isomorphic-git.
If performance is still inadequate, look towards improving the algorithm for
checkout in isomorphic-git. Specifically, the algorithm must run in a
single pass while still handling file dependencies appropriately.
Create a proof of concept, evaluative, internal-use tool implementing the
"new Git flow" outlined above.

After the tool is created its development will have it's own roadmap:

Implement partial cloning into isomorphic-git
Use partial cloning to implement a lazy loading filesystem
Experiment with semantic diff providers for both binary and text files
Create the new Git server with ref streaming and file-level authentication
Use what we learn with this tool to create the true "next Git"
Release it to the world
Wait

Besides the obvious additions, one notable change is that the lazy filesystem
has been moved much further down the roadmap, and has been replaced with sparse
checkout. The reasoning for this is simply that GitHub does not support the
filter option to git fetch-pack, and thus lazy cloning from a GitHub repo
would not be possible. Sparse checkout should provide similar performance
improvements while remaining more widely supported both by Git hosts and Git
itself.