Skip to content

Instantly share code, notes, and snippets.

@gorenje
Created June 9, 2012 09:20
Show Gist options
  • Save gorenje/2900238 to your computer and use it in GitHub Desktop.
Save gorenje/2900238 to your computer and use it in GitHub Desktop.
Subversion v. Git - Thinking in patches

Introduction

This post attempts to provide Subversion developers a new perspective on Git and how git is differs from
subversion but not using the usual “git is distributed development” or “git is peer-to-peer versioning
management”, which tend not provide an argument for an existing subversion project to switch to git.

Instead, I will attempt to provide a historical background to the development of the first versioning
tools and how these lead to the development of git. It is more that probably that certain historical events
mentioned here are completely and utterly wrong, this not intended. Corrections and improvements are very
welcome!

TL;DR: this entire post can be summed up by my personal mantra “Git is a patch management tool” and not
a software versioning system (although it can do that also).

Subversion v. Git

Svn and Git do the same things, they have the same purpose in life. They purpose is to manage software
changes to make it easy to see who did what, what changed when and which change potentially broke something.
Svn and git differ in how they do this. And the reason why they differ is that they have a different views
how to achieve the goal. But also because the came from two different development processes.

Subversion takes the point of view that each change is somehow related to the previous change, i.e.
revision 23 has an implicit dependency on the existence of revision 22, and so on. The reason for this
is clear: a change can only be done to a particular state of the code base.

Git takes the view that each change is isolated and stands on it’s own two feet. Hence the revision numbers
in Git are not linear, rather they appear to be random (which they’re not). More precisely, git is naming
the patches so that they can be referenced later. So all that git is doing is managing a bunch of patches
that can be applied to anything.

Subversion does not explicitly not take this view, however each revision is, essentially, also a patch.
Unfortunately svn does not provide any tools for allowing the extra flexibility of patches. Git does
and does it with explicit support for patching (e.g. git cherry-pick, git am, git apply).

What are patches?

Back in the beginnings of the open source community, when a developer found a bug in a piece of open
source software, they would create a fix, make a patch and email that patch to the maintainer. Now if
the maintainer had not done any further development, they could apply the patch to the code base and
world was rosy again.

If on the other hand, the maintainer had done some more development and made changes that made applying
the patch non-trivial, the original contributor had to upgrade their codebase and redo their patch (unless
the maintainer was feeling particularly benevolent and did that for the contributor).

Patches became the number one way of getting your changes/fixes/improvements into an open source project
and they became a type change management. However, they became impractical for larger projects, particularly
ones with many contributors and few maintainers. Out of this, RCS (revision control system) was born.

RCS had the intention of “marking” the state of a code base so that patches could be applied more easily;
if a patch was made from a particular revision of a file, then it could be applied to that exact revision.
Originally, RCS was just a bunch of scripts around the patch command to make their management painless.
RCS was also document based, meaning it managed documents individually and not an entire project.

What went wrong?

(For the sack of brevity, branching and merging have been ignored.)

Eventually CVS (concurrent versioning system) was born. CVS is the original server-client architecture
of software versioning and allowed multiple developers to work on one code base without breaking too much!
CVS maintained a central repository of the code base and incremental changes made to that codebase. It
also began the management of a project in its entirety (with the use of tags) and not as individual files.
This lead to the idea of a revisions, representing the state of the entire project and each individual file
in that project at a particular point in time.

Eventually Subversion (and many other VCS’s) came along which concretized the concept of a project-wide
revision, meaning that even if just one file changed, the entire project was bumped up a revision. Making
a revision very static and creating an implicit dependency on the previous state of the entire project.

Subversion also maintained the centralized one-world view of software versioning. Of course, this type of
architecture is very important if each revision depends on the previous one – there is basically no easy
way of maintaining a linear list of software revisions with multiple servers.

All this lead to a centralized and linear thinking in the software development process. However, this did
open the way for community development and larger contributor base for open source projects.

CDD – Community Driven Development

Open Source projects began using subversion for managing their codebase and that was great. It allowed a
group of developers to work concurrently and independently, without the risk of breaking or overwriting
existing changes.

Eventually what happened was that large open source projects still maintained changes and fixes via patches
because SVN did not easily allow contributors to provide changes. Although branching was a possibility, it
required that a contributor had to have commit privileges to the main repository – including “trunk”
(trunk generally being the branch that would end up being released).

So now a group of maintainers managed the stream of patches coming in from contributors without commit
privileges. This state of affairs introduced the concept of forking projects: taking the code base and
creating a new repo with changes made by the contributor. This was a particular issue for projects where
the maintainers did not have enough time to apply patches or simply ignored patches. One big driving
force behind this movement was (and still is) SourceForge.net – the first (noteworthy) subversion
repository hoster.

Thinking patches

Git came out the requirements for software maintenance of open source projects with core maintainers,
with reviewers and with contributors. For example, the Linux kernel has a bunch of core maintainers that
can commit, individual teams for specific parts of the kernel (these act, in part, as reviewers who
submit patches to the core committers) and contributors who have fixes, improvements and feature that
they would like to see in the kernel.

Again, this would all be patched based. Since the Linux kernel is a modular piece of software, a patch for
a particular driver could be applied regardless of what happened to other parts of the kernel. Hence
there is no particular need to maintain one central revision off which patches needed to be made.

Out of this, git was born. Git is basically a distributed patch management service and not a software
versioning system. Hence thinking of each commit as being a patch makes working with git easier. Branches
are just a collection of patches, patches may be merged into a single patch, branch may be checked out and
everything can be undone (using: git checkout master).

Git even explicitly supports patches by allowing for their creation (git format-patch) and application
(git am & git apply). In fact git does little to interfere with any existing development one might have,
it still supports a centralized development process (however not providing any easy way of having a
linear versioning of code).

Conclusion

Some things in life never change, and patching code has been part of the software development process since
the dawn of epoch. Diff and patch were the basis for many a good piece of software, both are still with us.

Git builds on this and provides a tool that has become essential community driven
development, providing versioning, patching, branching and merging … and undo!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment