Skip to content

Instantly share code, notes, and snippets.

@shubhscoder
Last active August 28, 2019 02:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save shubhscoder/3055f976bfe7ef3d3b11141dc34addc0 to your computer and use it in GitHub Desktop.
Save shubhscoder/3055f976bfe7ef3d3b11141dc34addc0 to your computer and use it in GitHub Desktop.
GSoC 2019 Project Report

Lumendatabase

Lumen is an independent 3rd party research project studying cease and desist letters concerning online content. Lumen collects and analyzes requests to remove material from the web. The main goals of Lumen are to educate people, to facilitate research about the different kinds of complaints and requests for removal--both legitimate and questionable--that are being sent to Internet publishers and service providers. Lumendatabase contains millions of notices.

Project Summary

A statistics dashboard is something, that is a must have feature for Lumendatabase. This dashboard gives a quick overview of the number of links that were added, the statistics of who added them, the links added in a particular time frame, etc. This dashboard would particularly be useful for summarization of various statistics. The data collected would also be useful for future plans to introduce some type of recommendation system in lumendatabase. The second part of the project was creation of web archives using either perma.cc or creating a custom archival mechanism.

Proposed features

  • Total number of notices
  • Total number of URLS
  • Notices by sender, receiver, and submitter
  • Notices involving a particular domain
  • Visitors by country
  • Number of URLs/entity
  • Word cloud from notice texts
  • Total number of unique entities
  • Web archival with perma / custom implementation

Work done

Completed Code

Open Pull requests

Work left to be done

  • Addressing review comments for open PRs till they get merged
  • Web archival with perma / custom implementation
  • Defining user access restrictions of the dashboard

Learnings

  • Designing a feature / product end to end, right from getting requirements to testing of the product
  • Unique and varied features of Postgres database
  • Handling large amounts of data and processing it efficiently
  • Rails conventions, code quality, working with elastic-search rails
  • Important of unit and integration tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment