Skip to content

Instantly share code, notes, and snippets.

@anjackson
Last active January 4, 2021 21:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save anjackson/009114eadeef713660bd9a8d48810413 to your computer and use it in GitHub Desktop.
Save anjackson/009114eadeef713660bd9a8d48810413 to your computer and use it in GitHub Desktop.
2020-01-03 US General Election Voting Data Problem

When digging too deep into Twitter, following some conspiracy tweets about Trump's election loss, I came to this odd site: https://hereistheevidence.com/

This site claims to provide tools and links to data to show alleged voting irregularities, and gives examples like this: https://twitter.com/indio007/status/1331828590552428544

Returned BEFORE ballot was mailed
23305 ballots pic.twitter.com/t0O5mUMWKh

— noone special (@indio007) November 26, 2020

Out of curiosity, I thought I'd see if I could reproduce the alleged irregularities.

The here-is-the-evidence site provides tools to download, but I'm not going to go anywhere near those. Installing software from a site like this would be very risky. And anyway, basic tools like grep would be enough to perform basic checks.

The site also provides links to download the data, which is a less risky activity, so followed the link to https://siasky.net/AAD0TfiKxaBWqzTnFbgv6lUA2X4N_Cl3DN7yT5FdPC8vzA/ to find the '2020 ELECTION DATA HUB - Powered by a decentralized blockhchain cloud.' Downloading that data, and looking one of the example lines where the dates are weird, and sure enough, it's there.

$ grep LEHIGH 2020_General_Election_Mail_Ballot_Requests_Department_of_State.csv | grep "05/05/1960" | grep MAILIN
LEHIGH,D,05/05/1960,MAILIN,08/25/2020,08/25/2020,10/19/2020,10/15/2020,187TH LEGISLATIVE DISTRICT,16TH SENATORIAL DISTRICT,7TH CONGRESSIONAL DISTRICT

i.e. the 10/19/2020,10/15/2020 part, where the former is the post out date and the latter the received date.

But is this data right? Well, it just appears to be a copy of the data from here: https://data.pa.gov/Government-Efficiency-Citizen-Engagement/2020-General-Election-Mail-Ballot-Requests-Departm/mcba-yywm

Downloading that data, and running the same check, and low and behold:

$ grep LEHIGH 2020_General_Election_Mail_Ballot_Requests_Department_of_State\(1\).csv | grep "05/05/1960" | grep MAILIN
LEHIGH,D,05/05/1960,MAILIN,08/25/2020,08/25/2020,09/25/2020,10/15/2020,187TH LEGISLATIVE DISTRICT,16TH SENATORIAL DISTRICT,7TH CONGRESSIONAL DISTRICT

This data has a different posted-out date, prior to the reciept date, i.e. it's all fine. Comparing the two datasets is hard because there are a lot of differences between the current official version and the version here-is-the-evidence points to.

At this point, it's impossible to prove what's going on. The dataset's official home page says it was updated on the 16th of December, but there's no indication of the nature of any changes, e.g. adding new data, or fixing data problems. If earlier versions aren't available in some way, it's impossible to check if the here-is-the-evidence one is an old version from the official source.

Looking at the unofficial file, it seems like the difference could be as simple as 'accidentally' sorting the date-posted column on it's own, rather than sorting the whole file by that column. Overall, whether by accident or by design, it seems likely that the data problems were introduced in the preparation of the unofficially hosted version. The hosting may be blockchain-powered, but without any connection to the original publisher, that brings no assurance that the data is as it was published.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment