Skip to content

Instantly share code, notes, and snippets.

@gilesbowkett
Last active March 11, 2016 02:03
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save gilesbowkett/19db5839c74339beb5c4 to your computer and use it in GitHub Desktop.
GitHub and "canonical" vs "original"

so re this:

https://www.pandastrike.com/posts/20150610-thought-experiment-github-community-view

@bkeepers's response was this:

I think it’s fundamentally a question of governance, which GitHub has been agnostic on thus far. @gilesgoatboy

— Brandon Keepers (@bkeepers) 10 March 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

I disagree. I think GitHub's made some very specific assumptions about governance:

  1. Governance exists
  2. The people who use the project are in communication with the person who originated the project
  3. The originator has any opinion at all about what the project does next
  4. If you create code which other people adopt and modify, you have an obligation to settle any disagreements they have amongst themselves, and an obligation to pay attention to them in the first place
  5. If your repo is more widely-used and current than the originator's repo, but GitHub users can't discover your repo and always end up at the originator's repo, that's not a user interface problem, it's a problem with the governance of the originator's repo and (somehow) with yours as well.

Number 5 doesn't describe me, and I take the assumptions in number 4 with a big grain of salt.

But, if my thought experiment revealed any kind of problem with GitHub, I think that problem would be that GitHub calls this agnosticism in the first place. (If I understand correctly, @bkeepers works for GitHub.) This is not agnostic. This is a set of assumptions about social obligations, social roles, and therefore governance.

GitHub's awesome, but I think this is indeed a subtle problem in its design assumptions, and it's the kind of subtle problem which evolves into a glaringly obvious problem as a user base gets bigger. I'd need to be a time traveller in order to tell you definitively whether or not it will ever become a big problem. However, it was a tiny problem years ago, and it's a small problem now. I don't think that's a good sign.

@gilesbowkett
Copy link
Author

I also think this is basically a generation gap, and GitHub's success and ubiquity are part of the reason for it. All five of those assumptions were pretty legit in 2007 or 2008, when GitHub was new.

@gdinwiddie
Copy link

With new capabilities come new paradigms. When CVS was new, I couldn't imagine using a non-locking repository. After I did it for awhile, it became second nature.

Is your proposal the answer? I dunno. I'm not terribly good at predicting the future. It may be that the next step forward would be to implement what you could of it outside of GitHub, but pointing to the the GitHub forks. Try it and see if it works, both technically and socially.

@bkeepers
Copy link

First, I think you raise a lot of great points and am enjoying exploring this.

But, if my thought experiment revealed any kind of problem with GitHub, I think that problem would be that GitHub calls this agnosticism in the first place.

Ok, "agnostic" may not be the best word choice here. There had to be some kind of structure, and the one GitHub chose gave each user a namespace where they can create and fork other projects. I call this "agnostic" because, in 2008, giving users their own namespace was a way of saying: we don't know what governance looks like in a world where code can be distributed but communities can't.

Eight years later, that is becoming a lot more clear. I'm really interested in continuing to explore this. Maybe it's a community view, as proposed. Maybe it's something else entirely.

@searls
Copy link

searls commented Mar 10, 2016

I'm a big @bkeepers and @gilesbowkett fan and I agree strongly with both of the things you said.

I don't know if I have a lot else to add to this conversation right now, but I'd love to talk to either/both of you about it in the future.

@gilesbowkett
Copy link
Author

@gdinwiddie:

It may be that the next step forward would be to implement what you could of it outside of GitHub, but pointing to the the GitHub forks. Try it and see if it works, both technically and socially.

I've thought about it, but

  1. it's a bit of a big data thing, where having direct access to the infrastructure would make it easier, and
  2. there's also a bunch of people already doing it here and there.

When I wrote that blog post I found a Stackoverflow page where people were tracking which repo was the "most canonical" for a given project, and an app for doing the same thing, and today somebody on either Twitter or HN showed me another such project. You kind of run into the same fundamental problem — which "which is most canonical?" project is the most canonical of the "which is most canonical?" projects? So I really kind of feel it's out of my hands and I just hope GitHub finds a solution, because obviously, any solution they come up with is going to have a pretty good claim to being the canonical one.

@bkeepers:
Yeah, there are a few different ways you could go. I hope GitHub doesn't end up having to build its own little Google, because if that's it, it'll probably never happen. Maybe you could just do a "forks" link which ranked forks by popularity and/or frequency of updates, or even just showed the top 5 by those criteria.

Glad you're enjoying the conversation, it's not intended as any kind of ferocious rant. 😄

I don't totally understand this, but I think I disagree with it:

a world where code can be distributed but communities can't

Obviously a lot of open source communities are geographically distributed, so that can't be what you're saying. "A world where multiple repos can exist, but a project is just one project"?

The thing is, in real life, communities fracture all the time, and open source communities fracture too. Sometimes that's bad, but sometimes it's good. The Node.js/io.js fork is probably the best story of a fork resulting in better software overall. Someone on HN was saying that the story of ffmpeg vs libav is a similar story, although I hadn't heard of it before today. It's pretty easy to make the argument that, in an ideal world, a person who was looking for the most up-to-date Node.js fork should, for a time, have been directed to io.js instead.

So, sometimes a project is just one project, but other times, a project becomes more than one project, and sometimes a project which had become more than one project goes back to being just one project again. In high-profile situations like Node.js/io.js, everybody knows about it anyway, so it's not a huge problem. It's only in the smaller communities where this seems to be a real problem, at least so far.

@bkeepers
Copy link

I don't totally understand this, but I think I disagree with it:

a world where code can be distributed but communities can't

Distributed version control means that when I clone or fork a repository, I get a full copy of all the changes to the code. But I don't get a full copy of the community. Forking divides the community. This was a big concern with forks in 2008, and I still don't think we have a full grasp on the repercussions.

The Node.js/io.js fork is probably the best story of a fork resulting in better software overall.

This could be an example of why forking should be hard. It's hard to imagine a better scenario than the current state, where the entire community coalesced around one implementation. Would the community be better off if GitHub had automatically pointed people to io.js? I honestly don't know. I started to get a little terrified when *-iojs forks of node modules were popping up. Eventually I imagine we'd start to see something akin to web standards for node implementations, where each "vendor" implements the spec plus some extra sugar. I don't think I could make the argument that this is better.

In high-profile situations like Node.js/io.js, everybody knows about it anyway, so it's not a huge problem. It's only in the smaller communities where this seems to be a real problem, at least so far.

This is a really great point, and a pain point I have experienced personally.

I don't know the answer. I just know there are huge repercussions that need to be carefully considered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment