Amardeep Kumar
Gitlab Link: AOSSIE: Social street Smart
Live API end point: http://13.233.115.232:8091/predict
Quick Links
With the advent of Internet, the problems faced by the people have also grown. These include abusive languages, fake news articles, click-baits, malicious websites and security attacks. Fake news has become increasingly prevalent over the last few years. Fake news's adverse effect can be seen more and more as people’s reach to social media and to the internet is been increasing. Fake news is not only creating communal hatred but also, polarizing general elections. Click-bait waste a lot of productive time of people. These headlines are written in a very catchy manner such that people are tempted to click these links and they don’t contain any relevant information, hence making it necessary to warn a user about click-bait. The aim of this project is to develop a Chrome Extension to make Internet a safer and more productive service for the users.
These are mainly for fake news Part of project.
- JavaScript, Chrome Extension Development(Node.js, Gulp packages, WebD)
- Python, Flask REST API, PostgreSQL Database, Flask-SQLAlchemy
- Good Hands on experience with Machine Learning and Natural Language Processing: NN, RNN, SVM, feature engineering, text processing, text similarity, Word and sentence Embeddings, etc.
- familiar with AWS instance and how to host API on it and universal port forwarding.
Divided into two parts:
- Frontend/ Chrome extension part: Regular struct like all other chrome extensions. content scripts
facebook.js
,twitter.js
,newsWeb.js
, scraps news post and send it to server-side for its verification. - Backend/ Server-side:
Fake News/ML
: consist of all py script to do text-processing, generate embedding, generate hand feature and model weight for ML model. Jupyter notebook on trained model is also attached within this repo.Fake News/News_Scrapper
: scripts to scrap news and authentic news from two websites on a regular interval with help of News-please API.Fake News/ DB
: contain Postgres DB table definition to store news.Fake News/ Cache
: definition of Cache table to store the result from various source so that we don't need to call ML model always for the same post.Fake News/ application.py
: main Flask APP which handles post request, Scheduled scrapping of news, etc.Fake News/ API_manager.py
: app manager to handle DB migration and hosting.
Clone the repository :
git clone https://gitlab.com/aossie/social-street-smart.git
for chrome extension
-
Install node.js, git
-
Change the directory :
cd social-street-smart/
-
Install the dependencies :
npm install
-
Build the extension :
gulp build
for backend/ server
change directory to server/Fake News/
-
install and activate virtualenv
-
install all required libraries:
pip install -r requirements.txt
-
also install nltk files, open python terminal:
1. import nltk 2. nltk.download('punkt') 3. nltk.download('wordnet') 4. nltk.download('stopwords')
-
install postgres for DB and cache
sudo apt install postgresql postgresql-contrib
-
create a db
sudo -u postgres createdb fakeNewsDB
if there is error and required to create user:
` sudo -u postgres createuser "usernameOfUrPC"`
-
to install chrome driver for news web scrapping(if not installed already)
sudo apt-get install chromium-chromedriver
if there is path error:
#Adding the path to the selenium line:
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")
-
to create all table and handle migration. Do this once
python API_manager.py db init python API_manager.py db migrate python API_manager.py db upgrade
-
to test if data is committed to DB properly or not
-
login to DB:
sudo -u postgres psql
-
open DB: list all tables present
\c fakeNewsDB
-
to open a relation/table:
SELECT * FROM "table_name"
-
-
API request struct for
-
run
python API_manager.py runserver
to start API on a local machine
It was a great learning period for me in the summer of 2019. Learned a lot of things like writing Production level of code, readable code. Only writing code is not necessary, it should be readable as well as reproducible, realized the value of developer as well as a user manual. All work as proposed was completed, relevant PR was sent timely and regularly.
-
MR 8 (merged) : created structured directory, Basic UI and Popup
-
MR 10 (merged) : Data sets and trained ML model for ClickBait classifier
-
MR 27 (open) : Scraping scheduler, testing and DB, Cache integration
-
UI can be improved.
-
custom word embedding trained on Indian News corpus will produce a better result.
-
list of News website, facebook pages, twitter handle needs to be extended more for scalability.
-
session manager and activity logs for API hosted on the server.
-
Integration of News Origin detector.