Skip to content

Instantly share code, notes, and snippets.

Last active Jan 22, 2020
What would you like to do?
Apache Airflow Feedback 1/22/2020

Airflow Feedback

This is feedback about contribution to Apache Airflow.


I first opened up a pull request in early March 2019 to add a Singularity operator to airflow. I am fairly experienced with Singularity (but not Airflow) so largely my strategy was to use an existing operator (Docker) as an example, and go from there. The interaction felt more controlled than usual because I was required to open a JIRA issue, but this wasn't terrible and is understandable for a larger community. A reviewer was very quick to respond, and I addressed a set of early issues, and did my best to answer questions about Singularity (it was more well known in the HPC/academic community than anywhere else). I ran into trouble when it came to writing tests, because I hadn't used the mock library previously, and the same reviewer gave fantastic help and support to address those issues. We went back and forth about testing and linting, and this was also good. It was after everything was green on May 22nd that the PR seemed to go quiet. It then proceeded to get stale after 4 months (October it was flagged by stale bot) and then by the time it was re-opened, the same reviewer did his best to help, and (I think) it was briefly green again, but conflicts arose quickly. I'm not actually sure how the helpful reviewer was updating my branch. The PR went stale again, and as a contributor I felt really powerless. I didn't know enough about the code base to fix conflicts (or even why they appeared for other files) and would rely on help. Then there was another contributor that +1'd addition of the integration, and a reviewer asked me why I was editing files that weren't relevant. I was just as confused as this reviewer, I hadn't touched those files, and it wasn't clear why they were changed. I felt like I was being placed at some fault, but I had no idea what had happened. This (recently in the last week) was when a bunch of contributors and community members flooded the PR with links to guides, documentation, and there was conversation on Twitter (I believe someone pinging me on there is what prompted the discussion to open again).

I opened the PR again, fresh as to not have conflicts, and the support was overall fantastic. There was one comment that said something along the lines of "Why is this so hard for you" and I found that to be a little condescending. Any contributor should not be singled out like that, and actually, personal comments about the contributor (and not the code) should generally be avoided. I'll address in the next section how I went about this interaction, and how I view the contributor mindset.

The Contributor Mindset

In this section I want to write about (on a higher, more abstract level) what it means (for me, based on my experience) to contribute.

1. Starting Mindset

When a new contributor enters a community, he or she will have a particular bias. The bias is usually based on a general perception of the community, which can be influenced by everything from the branding to previous interactions. In my case, for my first pull request I had only heard of Airflow, and I was excited about it because of being implemented in Python, and at the time it felt that it would be very powerful to add support for Singularity. Airflow is a tool that would be friendly to the scientific community by way of Python being a language of scientific programming, and being able to execute scientific workflows. I was excited! It felt like an amazing opportunity. So I went into the first interaction in a very positive, motivated mindset.

The second pull request I entered hesitantly and with hugely less motivation, and this was because the first experience was somewhat negative. Why was it negative? It felt like the maintainers (or community) had dropped the ball, and then pointed the responsibility on me to do everything again. It was frustrating. The takeaway point from this first step is that every contributor enters with an expectation and a mindset, and having that be positive is hugely valuable for a better lifecycle of the pull request, and actually, for the continued participation of the contributor in the community.

2. Contributor investment

Once a contributor decides to engage, we typically have limited bandwidth. In my case, it was another thing I was doing in spare time (not asked or suggested by anyone related to my work) because it felt important to do. This means that I might look for a for the basics (e.g., opening a JIRA ticket) but I'm not going to go through extensive tutorials, guides, or random files on GitHub. I think that contributors are generally reactive - we digest the minimal information to contribute, and then look deeper into an issue only when it comes up. For example, if a test didn't pass, if someone pointed me to a markdown document that would help, I'd read it then. When I think about it now, I likely didn't look at anything beyond the code for my first contribution - my intuition told me (based on seeing other containers as orchestrators) that it would minimally be good to bring up for discussion, and the actual development time wasn't too much because I wrote the Singularity Python library and felt fairly capable with Python. Some might argue here that it's best practice to open an issue first, but (although I sometimes support this view) I also think it's sometimes easier and more influential to just do the work, and present it.

To summarize this point, my investment is typically "the minimum amount to accomplish my goal," and I am usually not thinking that I need to become expert by studying documentation in advance. This works fairly well for a lot of open source contributions, because discussion about details picks up after the pull request (or draft) is originally opened. For most open source communities that I've participated in, I learn the most by looking at the current code base (using it as an example), and then going back and forth on details within the pull request. In terms of slack or other discussion communities, this could be an avenue to pursue, but for the most part I don't look to jump into another slack community for just a small contribution to a code base (I already have way too many tabs for slack, and largely it's distracting).

3. Contribution Expectations

Okay, so we've established entering an interaction with a specific goal to achieve, having a particular mindset, and having investigated the most minimal set of background to do the contribution. My expectations are that:

  • I can open a pull request and some pull request template will give me pokes about the laws of the land (e.g., opening the JIRA ticket).
  • Some small percentage of the time I'll read a in advance. But this doesn't usually happen, really it is typically triggered after I open the pull request and there is another bullet that tells me to do it.
  • I can open the pull request, and the tests will guide me about what I need to fix.
  • If the tests don't provide enough guidance, the community or maintainers will help.
  • I don't expect extra or special knowledge to be required to contribute.

What turned out to be challenging for Airflow was two things. The first is that the ecosystem was rapidly changing - I don't think when I first opened a PR there was the same kind of tooling / CI setup, or if there were equivalent documentation (very likely I would have looked at documentation for mock testing because I struggled with it). But arguably, even if these docs were available 10 months ago, I still wouldn't have looked unless someone pointed me to them. The second thing is that communication was so detailed that it actually got confusing. My typical response to failed testing would have been to just look at the testing errors and respond to them, but instead there were long, detailed descriptions of tooling I should use locally and (restating of the errors). In retrospect it distracted me for some time from actually looking at the errors, because I was focused on responding to the long threads. This is actually something I find fascinating - that too much attention or details can actually hinder a developer, given that it's going against the grain of how he or she would normally troubleshoot or solve problems.

With respect to the documentation, it felt more overwhelming than helping. I did look carefully at the pre-commit checks, and installed them for pre-commit and pre-push, but in practice it would run for a long time and fail, and then actually prevent me from committing the manual changes that I started to fix based on looking at Travis. I wound up installing and removing them twice, and it failed both times. It was just bad luck - the build was broken. When nothing seemed to be working, I stepped back and realized that I might do better to ignore the threads and just look at the testing errors. When I responded to the testing errors (and removed the extra local tooling that wasn't working) this is finally when I made progress and the tests ultimately passed. We're now at the point where tests pass, and I need to think more about what to try for testing Singularity. This is a good spot to be in, this is something I can do.

4. Communication

Looking back on the threads, this particular comment really broke my spirit:

it seems that for some reason it is quite difficult to follow (for you at least).

And to be honest, I didn't want to really continue after that, and my response reflects that I was subtly hurt by the comment. I think that having empathy, and (even when you are frustrated with a contributor asking questions or not getting it) being aware of how even subtle comments can be interpreted as hurtful is really important. This is actually a hugely challenging problem because there are also cultural differences in how we communicate. Those differences are compounded by the text-based communication. As someone from the US (where everything is wrapped in nicities) I expect that a US-based project would be wrapped with those same nicities. But not to digress, this is a hugely challenging problem because we are all human. Contributors and community members acting as reviewers can be equally frustrated, and make comments that single out a person, not in a good way, without even intending it. How is best to respond? I don't have a good answer, but I think that explicitly stating how I perceived it here to open up a dialogue is an okay idea.

But to say that it wasn't appropriate in this context is not to say it's not appropriate in any context. This is where I think there should have been separation between (more professionally) discussing the issue, and talking about meta experience around that. I think it was a fair question to ask about the Title validator, because mousing over the automated check, it didn't provide much detail about the issue. My general feedback here is that yes, it is likely in many cases that you have documentation somewhere to answer a question, but a community member or maintainer should not belittle a contributor in this way for asking a very straight forward question. As I mentioned earlier, it's one thing to say "The answer to your question is X, and here is a link if you want to read more." versus "Why didn't you see the answer is here? Why are you having so much trouble?"

5. It Comes Down to People

This entire document is about the contributor mindset. As I've outlined, it's a changing and adaptive thing based on expectations and interactions with people, and it's much easier to turn it in a negative direction than a positive one because many of us carry a negative attribution bias.

Robust technical documentation (and professional writers to boot!) are really great, but at the end of the day, the experience of a pull request comes down to the interactions of the community or maintainers with the contributors. A new contributor will almost never have taken the time to be an expert, and will definitely not be familiar with uncommonly used tooling (pre-commit, for example). If it isn't in the traditional "open source contribution steps" that are done for 99% of projects on GitHub, the contributor won't even have awareness for it. A best effort that a community can do is provide very specific things to do / run locally as a checklist in the pull request template, and then proceed with the review. During the review, respect and kindness is so important. It has to be okay to ask questions, even if you have documentation for it. It's better to answer the question and provide a link with more details, as opposed to answering a question and responding "I don't know why you are having so much trouble, we have clear instructions [here]." The first response feels like the person is helping me, and providing more details if I'm interested to learn more. The second feels like the person is annoyed with me, and thinks I am an idiot for not finding the answer on my own.

6. Future Contributions

The contributor mindset determines if the contributor decides to further engage with the community, or invest time elsewhere. Any community would want to encourage a positive feedback cycle - a new contributor comes in, has a positive experience, sees a very postiive example from those that support him or her, and then becomes a reviewer. The community grows in both people and positive association. I'm not sure I see that a vision of total automation for contribution, at least for new contribution, could ever be a realistic reality when you have new contributions that hold expectations around interacting with people.

I hope that this is helpful! Please let me know if you'd like further details on any particular points. Arguably, someone could write an equivalent document about the reviewer mindset, and largely it would come down to the same points but from the opposite perspective - having empathy, and reacting with kindness and patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment