vsoch/AIRFLOW_FEEDBACK.md

## AIRFLOW_FEEDBACK.md

      
    Raw
  

              AIRFLOW_FEEDBACK.md
            
          
    Airflow Feedback

This is feedback about contribution to Apache Airflow.
Background

I first opened up a pull request in early March 2019 to add a Singularity operator to airflow. I am fairly experienced with Singularity (but not Airflow) so largely my strategy was to use an existing
operator (Docker) as an example, and go from there. The interaction felt more controlled
than usual because I was required to open a JIRA issue, but this wasn't
terrible and is understandable for a larger community. A reviewer was very quick
to respond, and I addressed a set of early issues, and did my best to answer questions
about Singularity (it was more well known in the HPC/academic community than anywhere else).
I ran into trouble when it came to writing tests, because I hadn't used the mock library previously,
and the same reviewer gave fantastic help and support to address those issues.
We went back and forth about testing and linting, and this was also good.
It was after everything was green on May 22nd that the PR seemed to go quiet. It then proceeded
to get stale after 4 months (October it was flagged by stale bot) and then
by the time it was re-opened, the same reviewer did his best to help, and (I think)
it was briefly green again, but conflicts arose quickly. I'm not actually sure how the helpful
reviewer was updating my branch. The PR went stale again, and as a contributor I felt really powerless. I didn't know enough about the code base to fix conflicts (or even why they appeared
for other files) and would rely on help. Then there was another contributor that +1'd
addition of the integration, and a reviewer asked me why I was editing files that
weren't relevant. I was just as confused as this reviewer, I hadn't touched those
files, and it wasn't clear why they were changed. I felt like I was being placed at some
fault, but I had no idea what had happened. This (recently in the last week)
was when a bunch of contributors and community members flooded the PR with links to
guides, documentation, and there was conversation on Twitter (I believe someone pinging
me on there is what prompted the discussion to open again).
I opened the PR again, fresh as to not have conflicts, and the support was overall fantastic. There was one comment that said something along the
lines of "Why is this so hard for you" and I found that to be a little condescending.
Any contributor should not be singled out like that, and actually, personal comments
about the contributor (and not the code) should generally be avoided.
I'll address in the next section how I went about this interaction, and
how I view the contributor mindset.
The Contributor Mindset

In this section I want to write about (on a higher, more abstract level) what it
means (for me, based on my experience) to contribute.
1. Starting Mindset

When a new contributor enters a community, he or she will have a particular bias.
The bias is usually based on a general perception of the community, which
can be influenced by everything from the branding to previous interactions.
In my case, for my first pull request I had only heard of Airflow, and I was
excited about it because of being implemented in Python, and at the time it felt
that it would be very powerful to add support for Singularity. Airflow is a tool
that would be friendly to the scientific community by way of Python being a language
of scientific programming, and being able to execute scientific workflows. I was excited!
It felt like an amazing opportunity. So I went into the first
interaction in a very positive, motivated mindset.
The second pull request I entered hesitantly and with hugely less motivation,
and this was because the first experience was somewhat negative. Why was it negative?
It felt like the maintainers (or community) had dropped the ball, and then pointed
the responsibility on me to do everything again. It was frustrating. The takeaway
point from this first step is that every contributor enters with an expectation
and a mindset, and having that be positive is hugely valuable for a better lifecycle
of the pull request, and actually, for the continued participation of the contributor
in the community.
2. Contributor investment

Once a contributor decides to engage, we typically have limited bandwidth. In my
case, it was another thing I was doing in spare time (not asked or suggested by
anyone related to my work) because it felt important to do. This means that
I might look for a CONTRIBUTING.md for the basics (e.g., opening a JIRA ticket)
but I'm not going to go through extensive tutorials, guides, or random files on GitHub.
I think that contributors are generally reactive - we digest the minimal information
to contribute, and then look deeper into an issue only when it comes up. For example,
if a test didn't pass, if someone pointed me to a markdown document that would help,
I'd read it then. When I think about it now, I likely didn't look at anything beyond the
code for my first contribution - my intuition told me (based on seeing other containers
as orchestrators) that it would minimally be good to bring up for discussion, and
the actual development time wasn't too much because I wrote the Singularity Python library
and felt fairly capable with Python. Some might argue here that it's best practice to
open an issue first, but (although I sometimes support this view) I also think it's sometimes
easier and more influential to just do the work, and present it.
To summarize this point, my investment is typically "the minimum amount to accomplish my
goal," and I am usually not thinking that I need to become expert by studying documentation in advance. This works fairly well for a lot of open source contributions, because
discussion about details picks up after the pull request (or draft) is originally
opened. For most open source communities that I've participated in, I learn the most
by looking at the current code base (using it as an example), and then going back
and forth on details within the pull request. In terms of slack or other discussion communities,
this could be an avenue to pursue, but for the most part I don't look to jump into
another slack community for just a small contribution to a code base (I already have
way too many tabs for slack, and largely it's distracting).
3. Contribution Expectations

Okay, so we've established entering an interaction with a specific goal to achieve,
having a particular mindset, and having investigated the most minimal set of background to
do the contribution. My expectations are that:

I can open a pull request and some pull request template will give me pokes about the laws of the land (e.g., opening the JIRA ticket).
Some small percentage of the time I'll read a CONTRIBUTING.md in advance. But this doesn't usually happen, really it is typically triggered after I open the pull request and there is another bullet that tells me to do it.
I can open the pull request, and the tests will guide me about what I need to fix.
If the tests don't provide enough guidance, the community or maintainers will help.
I don't expect extra or special knowledge to be required to contribute.

What turned out to be challenging for Airflow was two things. The first is that the ecosystem was rapidly changing - I don't think when I first opened a PR there was the same kind of tooling / CI setup, or if there were equivalent documentation (very likely I would have looked at documentation for mock testing because I struggled with it). But arguably, even if these docs were available 10 months ago, I still wouldn't have looked unless someone pointed me to them. The second thing is that communication was so detailed that it actually got confusing. My typical response to failed testing would have been to just look at the testing errors and respond to them, but instead there were long, detailed
descriptions of tooling I should use locally and (restating of the errors). In retrospect
it distracted me for some time from actually looking at the errors, because I was focused on responding to the long threads. This is actually
something I find fascinating - that too much attention or details can actually hinder a
developer, given that it's going against the grain of how he or she would normally
troubleshoot or solve problems.
With respect to the documentation, it felt more overwhelming than helping. I did
look carefully at the pre-commit checks, and installed them for pre-commit and pre-push,
but in practice it would run for a long time and fail, and then actually prevent me from
committing the manual changes that I started to fix based on looking at Travis. I wound up
installing and removing them twice, and it failed both times. It was just bad luck - the
build was broken. When nothing seemed to be working, I stepped back and realized that I might
do better to ignore the threads and just look at the testing errors. When I responded to
the testing errors (and removed the extra local tooling that wasn't working) this is
finally when I made progress and the tests ultimately passed. We're now at the point where
tests pass, and I need to think more about what to try for testing Singularity. This is
a good spot to be in, this is something I can do.
4. Communication

Looking back on the threads, this particular comment really broke my spirit:

it seems that for some reason it is quite difficult to follow (for you at least).

And to be honest, I didn't want to really continue after that, and my response reflects that
I was subtly hurt by the comment. I think that having empathy, and (even when you are frustrated
with a contributor asking questions or not getting it) being aware of how even subtle comments
can be interpreted as hurtful is really important. This is actually a hugely challenging problem
because there are also cultural differences in how we communicate. Those differences
are compounded by the text-based communication. As someone from the US (where everything
is wrapped in nicities) I expect that a US-based project would be wrapped with those same
nicities. But not to digress, this is a hugely challenging problem because we are all
human. Contributors and community members acting as reviewers can be equally frustrated,
and make comments that single out a person, not in a good way, without even intending it.
How is best to respond? I don't have a good answer, but I think that explicitly stating
how I perceived it here to open up a dialogue is an okay idea.
But to say that it wasn't appropriate in this context is not to say it's not appropriate in
any context. This is where I think there should have been separation between (more professionally) discussing the issue, and talking about meta experience around that. I think it was a fair question to ask about the Title validator, because mousing over the automated check, it didn't provide much detail about the issue. My general feedback here is that yes, it is likely in many cases
that you have documentation somewhere to answer a question, but a community member
or maintainer should not belittle a contributor in this way for asking a very
straight forward question. As I mentioned earlier, it's one thing to say "The answer to
your question is X, and here is a link if you want to read more." versus "Why didn't you
see the answer is here? Why are you having so much trouble?"
5. It Comes Down to People

This entire document is about the contributor mindset. As I've outlined,
it's a changing and adaptive thing based on expectations and interactions
with people, and it's much easier to turn it in a negative direction than
a positive one because many of us carry a negative attribution bias.
Robust technical documentation (and professional writers to boot!) are really
great, but at the end of the day, the experience of a pull request comes
down to the interactions of the community or maintainers with the contributors.
A new contributor will almost never have taken the time to be an expert, and
will definitely not be familiar with uncommonly used tooling (pre-commit, for example).
If it isn't in the traditional "open source contribution steps" that are done for 99%
of projects on GitHub, the contributor won't even have awareness for it.
A best effort that a community can do is provide very specific things to do / run
locally as a checklist in the pull request template, and then proceed with the review.
During the review, respect and kindness is so important. It has to be okay to
ask questions, even if you have documentation for it. It's better to answer the
question and provide a link with more details, as opposed to answering a question
and responding "I don't know why you are having so much trouble, we have clear
instructions [here]." The first response feels like the person is helping me,
and providing more details if I'm interested to learn more. The second feels like
the person is annoyed with me, and thinks I am an idiot for not finding the
answer on my own.
6. Future Contributions

The contributor mindset determines if the contributor decides to further engage with the community, or invest time elsewhere. Any community would want to encourage a positive feedback cycle -
a new contributor comes in, has a positive experience, sees a very postiive example
from those that support him or her, and then becomes a reviewer. The community
grows in both people and positive association. I'm not sure I see that a vision
of total automation for contribution, at least for new contribution, could ever
be a realistic reality when you have new contributions that hold expectations
around interacting with people.
I hope that this is helpful! Please let me know if you'd like further details
on any particular points. Arguably, someone could write an equivalent document
about the reviewer mindset, and largely it would come down to the same points
but from the opposite perspective - having empathy, and reacting with kindness
and patience.