The Internet has been abuzz today about Facebook data scientists publishing a paper in the Proceedings of the National Academy of Sciences describing experiments in which they deliberately removed posts from users' news feeds. I decided to actually read the paper, describe what was actually done, and write up some of my observations both as a scientist as well as a Facebook user. I try my best to use language that should be accessible to almost everyone, not just other scientists.
What they did
The Facebook data scientists selected 689,003 people who viewed Facebook in English as subjects for this experiment. The experiments took place during a week between January 11th through 18th, 2012. The basic idea of the experiment was to measure the effect of removing positive (or negative) posts from the people's news feeds on how positive (or negative) their own posts were in the days after these changes were made.
Now on to the details of the experiment. The Facebook scientists created four different categories (there were about 155,000 people in each category, after filtering out people who did not post anything during the week the experiment was running):
(a) people from whose news feeds positive posts were removed
(b) people from whose news feeds negative posts were removed
(c) people from whose news feeds approximately 5% posts were removed, and
(d) people from whose news feeds approximately 2% posts were removed
Categories (c) and (d) are what's called control categories; they are necessary to measure whether it is the removal of positive and negative posts that has an effect on the emotional content of people's posts or whether it is simply the act of removing (any) posts from news feeds. The reason you need two different control groups is because, on average, there are more positive posts on Facebook than negative ones. So, (c) is the control group for (a) and (d) is the control group for (b).
To identify a post as positive or negative, the Facebook data scientists used a list of words from the Linguistic Inquiry and Word Count analysis software. This list was developed by a team of social psychologists and has been widely used to analyze the language in a variety of texts.
So what were the final results? The authors of the paper found that, as compared to their respective control group, people from whose news feeds negative posts were removed produced a larger percentage of positive words --- as well as a smaller percentage of negative words --- in their posts. The group of people from whose news feeds negative posts were removed showed similar tendencies.
- The idea of "emotional contagion" in large social networks is not a novel one and the paper certainly acknowledges that, in the abstract as well as the first few introductory paragraphs.
- The paper says that the content shown in a user's news feed is (even today) decided by a ranking algorithm that Facebook continually develops and tests in the interest of showing viewers the content they will find most relevant and engaging. Obviously, this specific experiment was not necessarily designed to be in the interest of the users and the paper does not say if and how the results of such a study might bring any benefits to users in the future.
- The paper clearly says that any removed content was always available by viewing a friend's content directly by going to that friend's timeline and that such content may have appeared on prior or subsequent views of the [same] news feed. Finally, it says that the experiment did not affect any direct messages sent from one user to another. I never go to any of my friends' timelimes unless there's a reason to and since the subjects of this experiment were unaware that their news feed is being interfered with, the fact that the content was available on friends' timelines does very little to mitigate the interference with the news feeds. However, the fact that the removed content may have been made available after the experiment was over does make me feel a bit better (the may is bothersome though).
- This paper does not prove that emotional contagion over large scale social networks is a real phenomenon. To their credit, the Facebook scientists are somewhat honest about that. They say that Although these data provide ... some of the first experimental evidence to support the claims, ... the effect sizes are small. A small effect size means, in crude terms, that the strength of the emotional contagion phenomenon is small. However, they are also right in that for such a large number of subjects, even a small effect size is far from negligible.
- I am not entirely sure why they report N=689,003 when it sounds like they actually filtered some of those people out since they report about 155,000 people in each of the four categories later in the paper.
- The paper also says that unlike previous studies their results show that no in-person interaction is necessary and that textual content alone seems to be a sufficient channel for the emotional contagion to spread, as it were. Without commenting on whether this specific claim is true (see next section), I will say that the power of social media to influence people's moods is far from surprising in 2014.
- The authors of the paper acknowledge a whole bunch of people at Facebook. It seems odd that they would not acknowledge the large number of Facebook users who (unknowingly) volunteered to be subjects for their experiment, if not incredibly short-sighted.
- Intriguingly, the paper also says that data processing systems, per-user aggregates, and anonymized results [are] available upon request. I wonder how easy it will be to make that request and get it followed up.
I can easily separate my final thoughts into two categories: first on the quality of the science described in the paper and second on the revelation that users' news feeds were "interfered" with without their knowledge.
So, about the science. Far and away, my biggest complaint is that the Facebook scientists simply used a word list to determine whether a post was positive or negative. As someone who works in natural language processing (including on the task of analyzing sentiment in documents), such a rudimentary system would be treated with extreme skepticism in our conferences and journals. There are just too many problems with the approach, e.g. negation ("I am not very happy today because ..."). From the paper, it doesn't look like the authors tried to address these problems. In short, I am skeptical the whether the experiment actually measures anything useful. One way to address comments such as mine is to actually release the data to the public along with some honest error analysis about how well such a naive approach actually worked.
(Note:See Brendan's comment below for an even more insightful analysis of the shortcomings of using LIWC to analyze social media posts.)
As a Facebook user, I am not entirely surprised that Facebook conducted the experiment without the users' knowledge. Personally, I am resigned to the fact that everything I post on Facebook is entirely public. I understand that Facebook couldn't make the exact details of the experiment public and ask people to opt-in because that would clearly introduce a serious bias in the results. However, I wonder if the following strategy wouldn't have worked better than what they actually did:
- Facebook announces that they are interested in running scientific experiments with users' data along with a clear motivation on how such experiments will make the Facebook experience better for everyone. Obviously, they do not disclose the exact details of the experiments.
- They ask users for help and to explicitly opt in if they are interested (with the option to opt-out anytime).
- They randomly choose subjects from this pool of actual volunteers.
- They publicize and explain the results to their users after every experiment is published and why conducting this experiment was (or will be) useful to them.
Public for a reason
I am making this gist public so that people can correct me in all the places I may have been wrong in my analysis, of which there are likely to be many given that said analysis was done in two hours on a Saturday night. So, comment away.
Nice writup, Nitin! On the issues with word list counting -- I just don't think previous work has ever established how much relative frequencies of LIWC words in social media posts correlates to emotions/moods of the authors. If this was established, all the concerns about negations or opinion targets and other semantic argument structures would go away, since they'd be implicitly incorporated into the noise of the measurement instrument (word frequencies). It looks like the paper finds the main effect of their manipulation to be a 0.1% relative frequency change to LIWC word frequencies. A good correlation validation study would tell us, what that effect size actually means, if anything. I want to know: how many z-scores of, say, person-to-person emotion variation, does a 0.1% word frequency change mean? Is it substantial, like a 0.1 z-score? Or more like a 0.001 z-score?
They cite 4 papers on prior use of LIWC. Three of them are:
(7) Guillory (a coauthor of the paper), in CHI-2011
(8) Kramer (a coauthor of the paper), in CHI-2012
(10) Golder and Macy, in Science, 2011
None of these really validate LIWC. They're examples of people running LIWC on web data, and publishing in venues where reviewers do not care about honest psychometric validation.
The most relevant thing they cite is an undated webpage:
(9) Pennebaker et al, the "How LIWC Works" webpage
Which describes several N=30 or so studies with psych undergrads, where they are assigned to write on particular topics, and correlate against questionnaire measurements of affect. Correlating against questionnaires is the right approach, but the writing situation is just radically different than the social media setting -- people could be describing what they just saw on television; could be copying text from their friend; people post only when they want to, not because they have to; they might have different writing styles than psych undergrads at the University of Texas; etc etc etc etc.
Ironically, Facebook's experimental platform is the perfect setting to run such a study (correlating behavior against questionnaire answers). Seems like a missed opportunity.