Skip to content

Instantly share code, notes, and snippets.

@ad6398
Last active December 4, 2020 19:30
Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save ad6398/2e0deeaf75c6990bca4821b4235bc716 to your computer and use it in GitHub Desktop.
Save ad6398/2e0deeaf75c6990bca4821b4235bc716 to your computer and use it in GitHub Desktop.

GSoC'19: Social Street Smart (final report)

Amardeep Kumar

Gitlab Link: AOSSIE: Social street Smart

Live API end point: http://13.233.115.232:8091/predict

Quick Links


Motive for Project

With the advent of Internet, the problems faced by the people have also grown. These include abusive languages, fake news articles, click-baits, malicious websites and security attacks. Fake news has become increasingly prevalent over the last few years. Fake news's adverse effect can be seen more and more as people’s reach to social media and to the internet is been increasing. Fake news is not only creating communal hatred but also, polarizing general elections. Click-bait waste a lot of productive time of people. These headlines are written in a very catchy manner such that people are tempted to click these links and they don’t contain any relevant information, hence making it necessary to warn a user about click-bait. The aim of this project is to develop a Chrome Extension to make Internet a safer and more productive service for the users.

Developer Manual

These are mainly for fake news Part of project.

Tech Stacks and skills used

  • JavaScript, Chrome Extension Development(Node.js, Gulp packages, WebD)
  • Python, Flask REST API, PostgreSQL Database, Flask-SQLAlchemy
  • Good Hands on experience with Machine Learning and Natural Language Processing: NN, RNN, SVM, feature engineering, text processing, text similarity, Word and sentence Embeddings, etc.
  • familiar with AWS instance and how to host API on it and universal port forwarding.

Process flow Diagram

gsoc19

Modules

Divided into two parts:

  1. Frontend/ Chrome extension part: Regular struct like all other chrome extensions. content scripts facebook.js, twitter.js, newsWeb.js, scraps news post and send it to server-side for its verification.
  2. Backend/ Server-side:
    • Fake News/ML: consist of all py script to do text-processing, generate embedding, generate hand feature and model weight for ML model. Jupyter notebook on trained model is also attached within this repo.
    • Fake News/News_Scrapper: scripts to scrap news and authentic news from two websites on a regular interval with help of News-please API.
    • Fake News/ DB: contain Postgres DB table definition to store news.
    • Fake News/ Cache: definition of Cache table to store the result from various source so that we don't need to call ML model always for the same post.
    • Fake News/ application.py: main Flask APP which handles post request, Scheduled scrapping of news, etc.
    • Fake News/ API_manager.py : app manager to handle DB migration and hosting.

Installation guide

Clone the repository :

git clone https://gitlab.com/aossie/social-street-smart.git

for chrome extension

  1. Install node.js, git

  2. Change the directory : cd social-street-smart/

  3. Install the dependencies : npm install

  4. Build the extension : gulp build

for backend/ server

change directory to server/Fake News/

  1. install and activate virtualenv

  2. install all required libraries:

    pip install -r requirements.txt

  3. also install nltk files, open python terminal:

     1. import nltk
     2. nltk.download('punkt')
     3. nltk.download('wordnet')
     4. nltk.download('stopwords')
    
  4. install postgres for DB and cache

    sudo apt install postgresql postgresql-contrib

  5. create a db

    sudo -u postgres createdb fakeNewsDB

    if there is error and required to create user:

     ` sudo -u postgres createuser "usernameOfUrPC"`
    
  6. to install chrome driver for news web scrapping(if not installed already)

    sudo apt-get install chromium-chromedriver

    if there is path error:

    #Adding the path to the selenium line:

    driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")

  7. to create all table and handle migration. Do this once

    python API_manager.py db init
    
    python API_manager.py db migrate
    
    python API_manager.py db upgrade
    
  8. to test if data is committed to DB properly or not

    1. login to DB:

      sudo -u postgres psql

    2. open DB: list all tables present

      \c fakeNewsDB

    3. to open a relation/table:

      SELECT * FROM "table_name"

  9. API request struct for

    1. Facebook Request

      fb_req

    2. Twitter Request

      twitter_req

    3. web news Request

      webNews_req

  10. run python API_manager.py runserver to start API on a local machine

Work and PRs

It was a great learning period for me in the summer of 2019. Learned a lot of things like writing Production level of code, readable code. Only writing code is not necessary, it should be readable as well as reproducible, realized the value of developer as well as a user manual. All work as proposed was completed, relevant PR was sent timely and regularly.

List of PR during GSoC working period:

Future work and improvement

  • UI can be improved.

  • custom word embedding trained on Indian News corpus will produce a better result.

  • list of News website, facebook pages, twitter handle needs to be extended more for scalability.

  • session manager and activity logs for API hosted on the server.

  • Integration of News Origin detector.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment