lintool/gist:10925877

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    In Defense of Google Flu

April 16, 2014
Disclaimer: I have not worked on Google Flu and have no inside knowledge about the project.
tl;dr Saying that Google Flu doesn't work is a bit like pulling up a spam classifier trained on data from 2009, applying it to spam today, seeing that the results are pretty shitty, and then concluding... well, Bayesian classification doesn't work.
Recent reports have concluded that Google Flu "doesn't work". See for example an article by Lazer et al. and a piece by Steven Salzberg. This is cited as an example of "big data hubris".
To me, the failure of Google Flu tells me only one thing: algorithmic effectiveness changes over time and models need to be constantly tuned, updated, etc. Let me propose a simple alternative explanation why Google Flu "stopped working" despite the initial success publicized in the original article: The Google engineers who worked on it left the company, moved onto other projects, or simply didn't care anymore. As a result, the original prediction model hasn't been appropriately tuned and updated... and it's been sitting there, making predictions the whole time. It's as if our email systems were still running a spam classifier from 2009... it doesn't work anymore... well, DUH!
Update: April 17, 2014
David Lazer (lead author of the cited paper above) and I had a nice exchange about this, which I captured in this Twitter timeline.
I thought that the attack on Google Flu Trends (GFT) was unfair because no experienced data scientist would throw away everything we know about a subject and reply on data-driven techniques exclusively, as is suggested by the article. David responded that the GFT example is held up as a poster child of the efficacy of big data. I concede this point; perhaps the original authors came across as a bit too flippant.
Admittedly, I haven't looked into the algorithm details, but I trust David's assessment that there are methodological flaws in the current form of GFT. Even still, it's an indictment of that particular data analysis, not data science and data-driven techniques in general. We are in agreement on this point.
Still, I wonder to what extent GFT is wrong because, quite frankly, it's not a problem Google cares about. If Google's revenue tomorrow started depending on making correct predictions, then I suspect the accuracy is going to get together very quickly...