Skip to content

Instantly share code, notes, and snippets.

@quinncomendant
Last active May 26, 2016 19:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save quinncomendant/a6fb0f21dc9dfc9938b6811fc886ac77 to your computer and use it in GitHub Desktop.
Save quinncomendant/a6fb0f21dc9dfc9938b6811fc886ac77 to your computer and use it in GitHub Desktop.
A service that crowdsources fact checking of web pages. This text is embryonic (text copied from an email to a friend), but it's a powerful idea and needs to start somewhere. I would appreciate comments: help identify problems, offer suggestions towards its design, and how to solve adoption. Email me at quinn@strangecode.com

The idea is to create a means for web users to rate the factuality of content on a web page. A browser extension, which allows individual phrases of text to be selected, then rated for accuracy and commented upon. The browser extension then displays a score of the page’s factuality, based on the crowdsourced ratings. Clicking the extension reveals more detail, including comments from fact-checkers and selected passages from the text with notes and external references. With the plugin running, it could then automatically add a light green or red underline to text on the page which has been rated, indicating exactly which phrases have been reviewed. Hovering the cursor over these phrases brings up a popup of notes for that statement.

But here's the exciting part: each phrase of text that is selected, rated, and commented upon (hopefully with cited references to why the phrases are factual or bullshit) will be sent to a central server with the rating metadata. With a lot of usage, a huge corpus of text phrases and corresponding rating metadata will accumulate. This can then be parsed with machine learning software to distill the phrases into meaningful tokens of what is factual and what is bullshit.

One usage of this would be to generate distilled stem-phrases which can then be used to automatically assign factuality ratings to text found on any web page containing phrases that match known stem-phrases. Those phrases may then be underlined in orange, which, when hovered, show related matching phrases (from other web pages) and their ratings and comments (the ones submitted by humans). The user would have to decide if the machine's logic generated a correct match, and can submit feedback to supplement the computer's interpretation.

Example: Pages with the following phrases are marked as bullshit, and submitted to the system:

"[…] vitamin C helps protect the body against colds and flus […]"
"[…] boost your immune system by taking vitamin C […]"
"[…] should take vitamin C if they feel a cold coming on […]"

The machine-learning server condenses this to:

stem-phrase: "vitamin C prevents colds and flu"
rating: bullshit 
confidence: 9

And of course this could be applied to any text in a web browser, such as phrases within email content (privacy issues FTW).

Having a huge corpus of crowdsourced fact-checking could certainly lend to other amazing, unfathomable (and monetizable) uses.

The internet is a cesspool of self-replicating bullshit on an unimaginable scale. And it is harmful. E.g., just try searching for helpful medical remedies and discover an endless quagmire of web forums recommending quackery, anecdotes told as truth, and profiteers selling snake oil. I think it's no stretch to say most people are led astray.

A factuality browser extension might be able to help here. But there are two major problems:

  1. It would be totally invisible except to those who voluntarily install the extension. It would be possible, however, to offer a javascript API for website publishers to install on their site to enable the functionality on the site without a browser extension. Adoption rates may be abysmal.

  2. Those who provide the factuality ratings may be the same assholes who perpetuate bullshit in the cesspool internet. Why should we trust ratings from users? Many people will have incentives to lie or deceive: to sell their own products or beliefs. Users with opposing believes may make a blood-bath from repeatedly counter-rating phrases ("Is so!" "Is not!" Is so!!!" "Is not!!!!" …etc) Solutions include a merit-based reputation system (ala stackexchange), a iterative rate-the-ratings system (👍 or 👎 on existing ratings), and sane requirements for submission (URL citation required from peer-reviewed journal?). Allowing users to rate-the-ratings provides metadata about which ratings form a consensus, and gives feedback about the validity of a contributor's voice. A reputation system will take contribution validity (review reputation), seniority (has the user been active for days, months, years?), and source external reputation system (how many FB friends/twitter followers, reputation score on stackexchange, etc). It is extremely hard to build a system that is impossible to game, but there are some projects that are succeeding at this.

While these two are big hurdles, the basic idea and framework would actually be pretty easy to build. The idea is enticing enough, it might be worth pursuing even if there is a high chance of failure. It would be an interesting experiment. It would help of course if there were funding or it could be done in partnership with an existing foundation or NGO who build software products in a similar field (Academia? EFF? PolitiFact? Skeptics groups?).

The project could also be initially adopted and used by a specific sequestered internet sub-culture, e.g., academics. It could be built into the website of a peer-review journal, and used by registered journal users, and only on articles in the journal. It would also help greatly to have a period of priming the database with initial content, contributed by a trustable legion of beta testers. These would be known, responsible members of academia or research groups. This would vet its functionality and allow building functional algorithms, before releasing it for the greater cesspool.

The project would be worth developing—if only as a experiment for internal use—to show what is possible, and spark further development in the essential field of fact checking.

@quinncomendant
Copy link
Author

So, perhaps the only reply to this post need be: this work is already being done by @hypothesis. =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment