yoavg/preprint-vs-anon.md

## preprint-vs-anon.md

      
    Raw
  

              preprint-vs-anon.md
            
          
    Putting papers on arxiv early vs the protections of blind review

The tension between putting papers on arxiv as soon as possible and the double-blind peer review process is ever present. Some people favor the fast-pace of progress facilitated by making papers available before or during the peer review process, while others favor the protection of double-blind reviewing (actually, of author-blind reviewing. reviewer-anonymity is not part of the debate).
As I now serve on an ACL committee which is tasked at assessing this tension, I've spend a longer-then-usual time thinking about it, and came up with an analysis which I find informative, and which others may also find useful. These are my personal opinions, and are not representative of the committee. Though naturally, I will share them there as well.
The analysis examines the dynamics of review bias due to author identities being made exposed through a pre-print, and its effect on other authors at the same conference. The conclusion, as usual with me, is that it's more nuanced than the straightforward story. And the negative effects are on others exist, but are also restricted in a predictable way.
I hope you find it useful, and I would love to hear critique of this, as I am still forming my opinion.
Let's start

Why is author anonymity important in reviewing? To guard against biases.
There are two kinds of author-related biases:

bias against certain groups of authors
bias in favor of certain groups of authors

Note that there isn't any proposal that I am aware of that drops protections against (1). Authors who wish for their work to remain fully anonymous in review, can do this by simply not uploading to arxiv. The protection may break if only a tiny fraction of the submissions will opt to remain fully blind, thus marking them as suspicious. But evidence from ML conferences that allow for unrestricted pre-printing shows this doesn't happen.
The rest of this post will then focus on (2), which is what pre-printing during review and publicizing on social media allows: authors will post their work on arxiv (and maybe advertise them on social media), the reviewer will know who wrote the paper, will be impressed by that person's prestige or qualifications, and review it more positively than other papers.
Working assumption: I assume that vast majority of authors who choose to pre-print their work do not do it explicitly in order to gain from positive-author-bias, but for other reasons (like wanting to put their work out there, fearing of getting scooped, etc). As a consequence of their actions they may also enjoy positive bias, but it is not their primary motivation. You may agree or disagree with me on this one, but that's how I view the system. Whether this holds or not does not have a direct effect on the analysis (maybe on some of its implications), but that's the assumption I held when producing it. If you disagree with this assumption, read critically and try to adjust for it.
Now, let's examine the effects of author-positive-bias.

Personally, I doubt that the significance of such bias is very large in the NLP community: people with a reputation which is strong enough to benefit from this bias, are also very experienced in writing papers that get accepted, which likely has a much larger effect. But, let's assume for the rest of this piece that this bias is real and strong and famous authors do obtain higher review scores because of it.
The crietria: negative effect on others. It is clear why some groups of authors may benefit from this bias, but does it mean other groups of authors are negatively affected?
Fairness vs no-negative-effect. Note that "not being negatively affected" is not the same as "being fair". Fame/prestige-based bias is never fair: it means that some individual put in less effort than some other individual to get to the same outcome. For me personally, this kind of "fairness" is also irrelevant. The world is not a fair place. That individual who worked less hard then me because of their reputation is likely also richer than me, has a more supportive environment than me, got better training, etc, etc. Or maybe they are also inherently smarter. I just take the fact that some people have to work harder than other people as a given, and try to do my best work. If other people succeed in their career more than I do for the same amount of effort, well, what do I care? this is not a zero sum game. It would be nice if we can make the world a tiny bit more equal through conference submission policies, but I don't see it as a priority. I mean, that person could just submit to NeurIPS or ICML and achieve the same good outcome for themselves.
What I do care about, is being negatively affected by the inequality. Assuming that both me and the other person each submitted a paper, will the (prestige-based) increased chances of acceptance for their paper mean decreased chances of acceptance for mine? So this is the thing I am going to analyze now. The answer is "it's complicated" and also "in many cases it is really not very bad".
Conference acceptance as zero-sum games. The popular story is that conferences reviewing is a zero-sum game. 1000 papers are submitted, but only 200 get in. So if the other person's paper gets a spot within the 200, there is one fewer spot left for my paper to get in. This story is, of course, true. But it is also very incomplete. The reality (of ACL and similar conferences) is much more nuanced and complicated. Let's dive in:
Complication 1: tiers

As a rough approximation, conference acceptance decisions are made as follows: there is some soft quota target, let's say roughly 20% of the papers. Each paper is assigned a "quality score" (which can also be based on the text, etc, it shouldn't necessarily be a numeric field), and then papers are grouped into bins: "accept", "reject" and "borderline". Some processes attempt to get more nuanced than that, with distinction between "strong accept" and "weak accept" etc, but these systems can also be reduced to the three cases of yes/maybe/no. Then, all the "yes" papers get in, filling part of the quota. The remaining quota will be selected from within the "maybe" group (this is where the randomness of the process kicks in. The comparison between the "maybe" papers is very noisy). The "no" papers will not get in. Now, what does it have to do with the questions of being negatively affected by some other paper getting a higher score? Well, it severely limits its effect.
First, note that if my paper was in the "no" group to begin with, then I am never actually ranked against the other paper. It's score doesn't affect me.
If both of our papers had sufficiently high scores to both be in the "yes" category, we are also not ranked against each other.
The cases where the other person's score may affect me are:


If the unfair boost in score shifted them from "maybe" to "yes" AND if I was the last in the "yes" group, and ended up in the "maybe" group as a result. This also still leaves me in pretty good shape, as I am now at the top of the "maybe" group, but if that group is rather flat, I do have a non-negligible chance of not getting in.


If we are both in the "maybe" group. In this condition I may really be negatively affected by a small increase in score for that other paper.


That last case is serious, I can really be negatively affected by their unfair advantage. But we also ruled out many other cases where I am not affected. The negative effect due to positive-bias in favor of the other author is only relevant for two papers in the "maybe" category (and a small corner case on the exact border between "maybe" to "yes").
Complication 2: tracks

The story I told above on how conference reviewing work was also incomplete. In NLP conferences, there are also topical tracks. We submit papers to a given track. Each track's papers are then handled separately, resulting in a "yes" list for each track, and also a small number of "would be nice if there is space" for this track. The number of "yes" are roughly according to the quota, so if the total target quota is 20%, each track will pass roughly 20% of its own papers as a "yes".  The program chairs then merge the lists for the given tracks, and may increase quotas for some tracks in favor of others, this may affect the nice-to-haves, but the "yes" in each track are generally safe. Even for the "nice to haves", it is rare that the program chairs make big decisions. To a very good approximation, the ranking is handled separately for each track, and individual papers from different tracks don't compete with each other. What does it mean for me? Well it means that if I worked on dialog systems, and the famous person with unfair advantage worked on large-language-models, their boost in score does not negatively effect me. I am not compared to them. I will be compared to other dialog-system papers.
Implications

We see, then, that the possibility of gaining higher scores due to positive-author-bias is real, but that it's negative-effect on other authors is restricted.
To a very good approximation, negative effect on other authors due to positive bias on famous authors only applies to (a) authors in the same track as the positively-affected paper; and (b) only in the case that the positively-affected paper is borderline at this track, and so it the other paper.
Now that you have this info, I will leave you to assess the potential negative consequences on your kind of submissions on your own.
I do want to highlight one more aspect, though: the strong incentive to pre-print-while-under-review and to publicize-your-work-on-social-media is restricted to "hot" areas where the research pace is "fast", and these are often nicely siloed to tracks already. If I work on a non-hot topic, the track I submit to already provides me with quite a bit of protection against the advantage of non-anonymous works with strong authors. If, however, I work on a hot topic, then the famous authors advantage may indeed negatively effect my chances. However, when working on a hot topic my own incentive is also to put my work on arxiv as soon as possible. Otherwise I may get scooped by others who pre-print (either in NLP conferences or in ML ones, where there are no restrictions) before I even get my reviews back. NLP conferences do not exist in isolation, so working in a hot topic while also adhering to fully-anon review is a failing strategy, regardless of what NLP conferences do.
Last remark: Privileged-author-advantage can be good also to non-privileged authors.

Lastly, I would like to remark on a common argument that comes up, according to which some disadvantaged groups of authors also correlate with topics that are less likely to get accepted, but if a famous-person were to work on these topics, the paper may get in due to the unfair positive bias. The implication being that it's unfair that a privileged and non-anonymous author can get a paper in on a topic that other authors cannot, only because of their non-anonymity.
I argue that in this case it might be unfair to the disadvantaged group, but at the same time it is also in the interest of the disadvantaged group for this to happen. Why? Let's say I work on a topic which the community doesn't really care about / doesn't find appealing / doesn't consider as worthy of research. Let's say Treebanking for HPSG Parsing. I keep submitting papers and they never get in. I tried submitting the with my name and affiliation unblinded, and no one cares. Poor me. Now some famous author from a famous lab also submitted a paper on Treebanking for HPSG Parsing, they also put the paper on arxiv and publicized heavily on social media, so the reviewers knew it was from them, were impressed, and the paper got it. On the one hand, how could they get a paper about my topic to ACL? There were never papers on this topic to ACL, no matter how hard I tried. This is unfair! On the other hand, now there is a paper on this topic in ACL. This means the topic is now somewhat more legit as a valid topic for ACL conferences. This means my future submissions on this topic has a somewhat higher chance of getting in. Here, the author-bias in favor of the other author worked in favor of my niche topic, and, as a result, it also worked in favor of me.
Things are almost never black or white.