Skip to content

Instantly share code, notes, and snippets.

@amitt001
Last active December 9, 2015 12:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save amitt001/261b18c7d5c0ba318559 to your computer and use it in GitHub Desktop.
Save amitt001/261b18c7d5c0ba318559 to your computer and use it in GitHub Desktop.
Sentiment Analysis Project Details
All the sentiment analysis data is present in the folder named "senti"
Directory structure:
senti
├── Trainingset_creator
│   ├── README.rst
│   ├── appsid
│   ├── reviews_crawler.py
│   └── settings.py
├── config.py
├── rate_opinion.py
├── reviews_sentiment.py
├── reviews_sentiment_read.py
├── reviews_sentiment_write.py
└── sentiment_mod.py
The main file:
* rate_opinion.py: This script is the main script that internally calls sentiment_mod. Based on the response of
sentiment_mod module it saves the data in mongodb database. To run simply run this in terminal:
$ python rate_opinion.py
But this script will take a lots of time because more than .2 million apps.
* sentiment_mod.py: Module to get the sentiment. It can be used directly. Usage:
In python console:
>>> #call the sentiment method. This will return pos for positive or neg for negative.
This may also return neu for neutral. Neutral means no words were present in the featureset.
For ex hindi words not present in featuresent.
>>> import sentiment_mod
>>> sentiment_mod.sentiment('test text for testing.')
>>> pos #or neg
___________________________________________________________
1. Trainingset_Creator:
This directory of no use rightnow. I used the review_crawler.py script insife this dirrectory to create
training set for sentiment analysis. Now the sentment analysis models are alredy created this directory is not required.
2. config.py: configuration for getting and setting the data out of the mongodb database.
3. review_sentiment.py: Not used.
4. review_sentiment_write.py: THIS Trains the classifiers and then PICKLES it in picle directory.
5. review_sentiment_read.py: This flies code is similar to 'review_sentiment_write.py'. THIS FILE READS FROM
THE ALREADY PICKLED FILES in pickle directory. FIRST RUN 'reviews_sentiment_write.py' and then RUN THIS ONLY
TO CHECK ACCURACY FROM THE ALREADY PICKLED FILES. ALL THE CHNAGES MUST BE MADE TO THE 'reviews_sentiment_write.py'.
Classifiers used:
1. Naive Bayes Classifiers
2. Neutral Support vector machine classifier,
3. Linear Support vector machine classifier,
4. Stochastic gradient descent Classifier,
5. MultiNomial Nive Bayes classifier,
6. Bernoulli Naive Bayes classifier,
7. Logistic Regression classifier
How algo works:
I am using scikit-learn package of python for classification. All the algorithms rate the reviews and then lastly based
rating with higher votes reviews are rated. For ex: if 1,2,3,7 classifier votes a apps review as positive and 4,5,6 rates
it a negative it. Postive will be considered as the fina result because more than 50% votes for it.
Note: Positive.txt, Negative.txt and pickle directory are missing from the github repo. These files are present on server.
* pickle: this directory contains the serialized data for all the 7 algorithms. So, instead of traing the testing the data
each time we want to test the sentiment of an apps review the pickled files are read adn loaded in memory.
* positive.txt conatins the postive training reviews. I used this file to train the classifiers.
* Negative.txt: same as abouve for negative training and testing data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment