A guide about accepting contributions for Open source projects.
Ever since you could send patches by mail we'd interact over thousands of miles, exchanging code snipped between people we don't even know beforehand.
That and the transient nature of contributions might be the main the reasons it's very hard to find out why things go wrong.
The guide tries to pin down most common issues and bring to light some possible reasons.
I'm writing based on my experiences with some projects, the feedback I heard from frustrated users and overloaded devs. In case of Check_MK I've even seen all sides, but this writeup is based on a lot more projects / interactions. In most cases I'm just a mere user. At rare times I've been intrigued, but most often I've been driven mad.
Yes, some of this is guesswork, some of it is deduced from the different stories I kept hearing.
Maybe you can recognize some of these?
As a non-dev user, you can see your infrastructure go to ruins by upstream changes or despair when a small utility starts getting overengineered and then proudly rewritten to some incompatible mutant of "all you ever needed".
As a normal contributor you basically have no remote clue how to add your code. You try to get it really good, send the PR (you did that often enough) and then weird docbook formatting is said to be wrong, something that has no relation to what you tried to add, and you're lost. It's a busy week and basically you forget about your patch. For about two years, till you upgrade and suddenly find that part of the upstream is still broken.
With the dev hat on you can die inside having to explain to a contributor why it's not OK to write this formula differently than everywhere else in the whole code base. As if you'd never thought about it, right?
As someone doing community support you feel pretty hopeless if there's no place where you can find out if a certain workaround for a bug from two years ago is still needed? The users still apply it, just to be safe. The dev doesn't even remember the issue, but thinks it should be fixed. If you ask him if he actually tested that fix, he unfortunately didn't answer about that one weird edge case from comment #37 in the bugtracker.
*The thing is - you actually experienced each of those issues some time, right???*
I think that's the point where you know things are broken and we need to do something!
This guide is designed as a TL;DR broken out into three sections:
1. The issue
2. Reasons why it might be happening
3. How to improve
I hope this helps you reflect on your contribution processes and improves your community (and its code).
Any colour you like as long as it's black
-- from the secret book of contributor advice
- crappy code
- stuff you just rewrite instead of signing it off
- missing pieces (i.e. no example in/output)
- code where someone got excited and it grew and grew and now it's 50 lines for the 3 lines it should have been
- code written totally different than you'd have done it
- missing tests or doc
- you don't have a style guide
- you have high (and possibly unreasonable) requirements
- people don't read (and that's your fault. People's speed differs both for reading and processing information. So there can be people that understand your code immediately but can't read more than 20 consecutive lines!)
- people are trying to build code the way THEY like it
- not varied enough test examples, or your tests aren't even public
- it's awkward to write tests or doc for your project because you overengineered their format
- you left out some functionality that people need (you haven't yet implemented something yet, or don't yet handled some condition (including errors) which your project scope requires)
- people don't understand the effects of letting out a required part (i.e. you write a plugin to manage the backlights of your Tesla. if you don't submit a dummy test, should the devs go and buy one out of their own pocket?)
how to manage:
- try to work your real quality goals into the style guide (not just indentation but when it's OK to break out to a module, etc. if you try to be consistent about something, it should be documented somewhere)
- make your style guide short and concise, highlight important parts
- center the style guide around a checklist (so: don't have the check list as a part of the style guide, instead link chapters from the list. This give a sense of progress and safety for the contributor)
- Don't leave stuff that takes a while to figure out as an exercise. people aren't there to exercise
- can you dedicate someone to help contributors? Having a resource ready to clean up stuff so this becomes a fast routine instead of a awkward thing you do in(stead of) pastime
- be darn hard about what your code looks like and make it visible that it'll not be OK if it's not matching to what the project wants
- if you need extra snippets (i.e. data from other tools or tests), and a patch is missing those, fire it back immediately, not after a few weeks)
- if you need external data, be aware that i.e. every(!) single(!) person in professional IT is under NDAs and can't just submit you that. Work out a way that works for both of you
- it's NOT OK to let patches go stale
- don't let anyone "obsess" over things that aren't changing the scope for sake of "correctness". i.e. if rpm handles slightly malformed packages and XYZ handles RPM packages, it should be able to with the same packages. The only useful exception is if your tools' scope is to crash on malformed input
Keep an eye out in your committer team for people that try to suppress things that aren't a "perfect" solution.
First problem is they value correctness higher than functionality. That is a broken mental model.
If you (or i.e. money, having a job) makes them to pull in something they don't like they often give it extra-sloppy review and delay it. Subconsciously they’re fully dedicated to set it up to fail.
I think actually this is quite easy to resolve, even if you notice it in yourself.
We just slightly mediate between the two positions.
The code should be escorted by good fail-safes and a hardened interface.
You definitely need to add a note that explains why this use case mattered (i.e. to be feature complete) so you don't get stuck hating it forever but instead first see it as some necessary evil. Once you got used to it you can often identify why it’s causing you so much grief. Often it’s an architectural issue in another place. It takes at least as long to admit that as it’ll take to solve it later. The end result will be satisfying though!
Otherwise I bet it'll turn into a regression on your next rewrite of that piece of code - which would mean you keep breaking stuff for some people over and OVER and OVER AGAIN.
What do you think will people do with a piece of software they use that has functional holes and they send a patch and the patch is lost every 2 years? What will it do to your user base over a longer run?
- need days or months to integrate
- might still influence other parts
- might take longer than anyone's normal attention span to review (up to multiple months)
- means other features or issues need to be pushed back for the contrib
- it's too hard to contribute for them!
- high-profile project they're afraid to interact with, afraid to contribute to
- lack of secure contrib channel (if your laptop hostname leaks, we'll smile. if an enterprise server's hostname leaks, there will be trouble... you know industrial espionage happens longer than you live?)
- standard functionality too trivial
- complex features that some need but never get attention foster big "drops"
- little requests are ignored a few times, so they stop sending little contributions
- other case if they i.e. add support for a new hardware which is not yet public (how can someone under an NDA contribute to your project ahead of release and be legally safe from lapses on your end? Oh, of course they really can't unless you thought that through)
- they hack until they're sure it "could be OK." This means, basically, they keep adding stuff outside their main goal just to raise chance it'll get accepted. or they just go crazy about it and keep adding anything they can think of because they're proud
manage this by:
- have guideline for larger contributions or such that need time to cook up (open issues & PR early)
- only accept such drops if they commit to helping with the documentation
- have a public CI as gatekeeper (people less afraid of that, and they can make it "pluggable" independently, so effort scales better)
- make sure they understand it's totally normal if they break their own branch at first
- let others do this support (thus, have everything public, so they can ask and work well without being a core dev)
- offer a *contact* for them, if they got a 20K line commit for you they should get in touch there. two people can sort out a lot of issues by a single irc chat.
- convince vendors to commit some bit of support ressources once they send in their large patch. Ask who you'll be able to ask *next year*!
contribs that subvert the project goals
You got a large contribution that does something that, well, just doesn't have anything to do or in common with all the other parts of your software you love.
- the user loves your software!
- he ended up using it as a hammer for EVERY nail
- he's got other problems to tackle but doesn't want to have yet another interface to do it
- you might have code still around that also went to a different direction. you don't think about it anymore, and you might never have liked it because it didn't play out nicely. but others still see it and might just be riding that elephant in your room
- On rare occasions it does happen: someone thinks further ahead and codes better than you. But they basically hand you a spaceship you can't even fly!
manage this by:
- make sure your software has a well-defined scope and architecture. if you make it sound almighty, you can't complain about random additions :)
- if you have that scope publicly defined it's also fine to adjust your scope as time goes. Maybe you want to shrink it even further, or add a bit. Either is fine if you do it publicly instead of leaving your community in the dark, at your peril
- be interoperable! your software should fit under a common umbrella (i.e. having other tools directly integrate your software and call it from the outside)
- be interoperable! your software should be able to *become* an umbrella without needing to extend it internally (so yes, hooks/plugins, and please expose the full internal state to them)
- decide, consistently, if you should remove non-core code you hate
- then announce and do it visibly
NOTE: but while deciding, make sure you honestly evaluate the potential of the feature. you never know, someone else might contribute a nice idea that makes it work better. if deleting that code removes a whole class of functionality from your software, it's a bit of a weird thing to do.
If it's unmaintainable for you because i.e. it works with some standard you don't normally work with, state that somewhere in your project. no one thinks you can know everything. if they know it's a maintenance issue, maybe they can do something about it.
For example, large corporations with 100s of storage arrays will be better at testing against them than someone writing a small plugin in their software to support them.
They also have the ressources to press vendor bugfixes you'd never see happen. So let them help you there.
Don't make sneaky deletions, i.e. a cleanup of the implementation of feature "X" which also drops out "Y" in one place because you didn't like "Y". Feel free to kill "Y" but do it visibly! Each time you don't do that, you will ruin someone's day, maybe even years later.
let people know what the core teams' strengths are. This gives you a safe position to demand contributors keep technical ownership of the contribution instead of offloading decades of ongoing maintenance to you.
undefined random features get added randomly:
- you randomly get outside of tree contributions for insane features
- major contributions to your project happen but are later forgotten
- your project is called Xen
- your software is really successful
- your software is alive and used i.e. as a base for research
- your interfaces are apparently well-designed and it's possible to build castle upon skies using your software. this is awesome
- your development process and project leadership are horrible
- people who start coding major additions to your software are NOT aware who to get in touch with and when to do it
how to fix it:
- define a community manager with decision power (Xen did that, fixed their issues)
- Provide ways to communicate with the community, both real-time (like Gitter, IRC) and asychronous (like GitHub Issues)
- Another good example is the FreeBSD-core team
- Linux is a horrific example: LKML is too hostile and the susbsys maintainers are free to irrationally do what they want. If they had to present their choices to the other maintainers a lot would improve
(Best example is how a storage maintainer dropped FC-IP support because he thought it's a storage-only protocol. It’s not)
But this is just an *example* for an undefined, subconscious process of dealing with components).
constant flow of questions and complaints
- you're having fake FAQs
- most projects separate developers and "contributors"
- this, in essence, says that people maintaining your wiki, faq or spending many hours answering support questions are somehow less important.
- if you feel overwhelmed integrating contributions or doing supporttesters and doc writers are your best line of defence against fragile contributions, but they have no say
- your devs, not the testers, decide on releases
how manage this:
- don't separate developers and contributors. name teams, or name everyone
- take help to stop having TODOS in your documentation. , you need to find ways to invite more people to the project
- take help to stop inadvertedly breaking edge case functionality
- stop having to worry any of this - instead say thanks to someone who keeps their eys on that
- your FAQ should not consist of the question a devs considers 'worthy', and people shouldn't have to beg for something to be added to it. let someone review your support channels for common issues and write the FAQ based on that
- make testers your peers when the roadmap is decided on. if they spot a weird issue, it will fall on your head later anyway. You might manage to be so burnt out by then that you don’t remember being warned, but I think “I wanna ruin my life” is not a project goal?
- listen to doc writers. (i.e. a shitty cli can ruin your project worse than some other flaws)
- agree on a list of release criteria. if the shiny new core rewrite doesn't fullfil some bits, this is the time to announce it and put them on the roadmap. if it's not a major release, then DON'T release if you're not sure if stuff is working. That includes even the case where the QA guys just broke the QA
- like devs, testers can work on multiple projects and improve their output quality with the experiences from other projects
failed / unacceptable contributions
- most people who aren't the project inventor will deliver worse code
- having no requirements will not let them know it's too bad
- often they'll also just not be devs but experts in another area
- contributions can also just be so bad it's faster to rewrite them
how to manage:
- make QA happen external of the dev team
- have strict style guide
- let QA people fight that battle, let pull requests belong to them
- give them authority to rewrite and commit what they liked. you can fix it anyway, but don't stop them after they cleaned stuff up
- make clear which areas are NOT OK for any "non-core-dev" contributions. i.e.: Crypto support. The Debian issues came from letting a "normal" level person work on code that needed extreme skills
*Remember: Normally you need to answer every pull request you get*
- how does it scale?
- did you consider letting more people do this?
- describe a way to ask for help if one just doesn't figure out how to get it right?
- describe a list of criteria that will never be accepted?
- also, this will be needed, make clear when someone stepped over any line. kick them out and let them know. (but please, don't take yourself too seriously: if your software is constantly causing data loss, don't feel insulted if someone jokes about it)
More tactics to keep in mind
- you should have a staging area for contributions. they can go there but they can't leave without a maintainer
- have a community manager who can reach out to the people behind the software or whoever took over the next internship at their research lab
- ideally they can get in touch with the authors early on and lead their way to be one of your contributors
- pull out-of-tree contributions to it. Scan your mailinglists, search github, and collect everything in your staging area. DO THAT
- run all your tests against them so it'll be visible publicly when they break due to not being rebased
- rank all contributions by code quality and test successes
- the general goal should be to make them work with and on your base
That way you can make sure at least basic project management or what EVER you like in your project can apply to their code, too.
- ceph blueprints:
- forces "requester" to outline what they really want
- allows to keep track of far-fetched 'killer features' noone can build now.
- channels interaction:
- anyone wanting to add a feature like that can be pointed at the community approved design
- allows testers / users / devs for a feature to materialize and meet before implementation even starts. they'll be ready to support whoever tackles the hard bits
- bacula vote:
- anyone who has a good idea of a missing feature can spec it out
- they themselves have no extra influence on when if / it'll happen
- anyone can pick up on it if they have the resources
- users could (till 2009, which incidentially is, when project went downhill) vote which thing they needed most
- if contributors want to change the implementation away from the spec, they can do that in discussion with the people who raised they hand to be interested in that feature. avoids getting 3 different conflicting implementations of a thing over time
the scheme behind this is:
- use your community ressources to spec out features
- use those specs to channel contributions and avoid single-use-case implementations ("Keep It Simple, Stupid" BUT NOT "Keep It Simple and Stupid"
- you are raising the chances of getting a really nice, working implementation of something users wanted without doing much on your side. This is what makes OSS economically viable for you!
- do not lose those specs, requirements engineering is *expensive*.
- you are getting valuable contributions there, don't think this is worthless
- if you ignore this stuff, you get forked badly
you'll have to write the same code as they wanted on average a year after the fork because your community will most definitely demand it once it's in the fork. (Hello Samba/SambaTNG, Bacula/Bareos, pfSense/OPNsense, Nagios/Icinga/Naemon...)
we have a large tooling problem there, make sure those things don't get opened as github issues but as something else. otherwise you end up in a trap where you need to close them to maintain project visibility.
Finale and Thank You
I tried to reflect on why it's sometimes so incredibly hard to integrate even smallest contributions, explain some of the frustrations and to write down all the little traps we can fall into when interacting with others.
I think Github can be seen as both a pain and a remedy in this - To me stays a platform centered around avoiding real human interaction.
Formalizes things well, but fails in others (i.e. how to tell how dead a project is and which of the 5 forks is missing which patches and where you should actually contrib).
We need to get better at determining long-term cost and value of contributions.
No longer we should decide purely on "great indentation, applied well, good contributor" base. Growing our projects doesn't work like that.
* Make contributions easy
* Help to lift code in, but offload maintenance if needed
* Open up your infrastructure, accept in when it's green
* Don't let feature patches rot
* Do not distinguish between code / doc / qa contributions
* Make sure the project is approachable
* Inspired by Matt
* Started by Flo
* Reviewed and extended! by Marco (@brontolinux) and Matt (@mjbrender)
* The Xen joke was further smiled at by Cecile
Almost solid examples:
Gitlabs contrib page is beautiful and helpful but it packs too much info into one place, I'm not sure it's good code contribution info and issue management and everything there. it works if you're deep in the project but will be too much for random visitors.