amitt001/Details

## Details
All the sentiment analysis data is present in the folder named "senti"
Directory structure:

senti
├── Trainingset_creator
│   ├── README.rst
│   ├── appsid
│   ├── reviews_crawler.py
│   └── settings.py
├── config.py
├── rate_opinion.py
├── reviews_sentiment.py
├── reviews_sentiment_read.py
├── reviews_sentiment_write.py
└── sentiment_mod.py


The main file:
* rate_opinion.py: This script is the main script that internally calls sentiment_mod. Based on the response of
sentiment_mod module it saves the data in mongodb database. To run simply run this in terminal:
$ python rate_opinion.py
But this script will take a lots of time because more than .2 million apps.

* sentiment_mod.py: Module to get the sentiment. It can be used directly. Usage:
In python console:

>>> #call the sentiment method. This will return pos for positive or neg for negative.
This may also return neu for neutral. Neutral means no words were present in the featureset.
For ex hindi words not present in featuresent.
>>> import sentiment_mod
>>> sentiment_mod.sentiment('test text for testing.')
>>> pos #or neg
___________________________________________________________

1. Trainingset_Creator:
This directory of no use rightnow. I used the review_crawler.py script insife this dirrectory to create
training set for sentiment analysis. Now the sentment analysis models are alredy created this directory is not required.

2. config.py: configuration for getting and setting the data out of the mongodb database.

3. review_sentiment.py: Not used.

4. review_sentiment_write.py: THIS Trains the classifiers and then PICKLES it in picle directory.

5. review_sentiment_read.py: This flies code is similar to 'review_sentiment_write.py'. THIS FILE READS FROM
THE ALREADY PICKLED FILES in pickle directory. FIRST RUN 'reviews_sentiment_write.py' and then RUN THIS ONLY
TO CHECK ACCURACY FROM THE ALREADY PICKLED FILES. ALL THE CHNAGES MUST BE MADE TO THE 'reviews_sentiment_write.py'.

Classifiers used:

1. Naive Bayes Classifiers
2. Neutral Support vector machine classifier,
3. Linear Support vector machine classifier,
4. Stochastic gradient descent Classifier,
5. MultiNomial Nive Bayes classifier,
6. Bernoulli Naive Bayes classifier,
7. Logistic Regression classifier

How algo works:
I am using scikit-learn package of python for classification. All the algorithms rate the reviews and then lastly based
rating with higher votes reviews are rated. For ex: if 1,2,3,7 classifier votes a apps review as positive and 4,5,6 rates
it a negative it. Postive will be considered as the fina result because more than 50% votes for it.

Note: Positive.txt, Negative.txt and pickle directory are missing from the github repo. These files are present on server.
* pickle: this directory contains the serialized data for all the 7 algorithms. So, instead of traing the testing the data
each time we want to test the sentiment of an apps review the pickled files are read adn loaded in memory.
* positive.txt conatins the postive training reviews. I used this file to train the classifiers.
* Negative.txt: same as abouve for negative training and testing data.
	All the sentiment analysis data is present in the folder named "senti"
	Directory structure:

	senti
	├── Trainingset_creator
	│ ├── README.rst
	│ ├── appsid
	│ ├── reviews_crawler.py
	│ └── settings.py
	├── config.py
	├── rate_opinion.py
	├── reviews_sentiment.py
	├── reviews_sentiment_read.py
	├── reviews_sentiment_write.py
	└── sentiment_mod.py


	The main file:
	* rate_opinion.py: This script is the main script that internally calls sentiment_mod. Based on the response of
	sentiment_mod module it saves the data in mongodb database. To run simply run this in terminal:
	$ python rate_opinion.py
	But this script will take a lots of time because more than .2 million apps.

	* sentiment_mod.py: Module to get the sentiment. It can be used directly. Usage:
	In python console:

	>>> #call the sentiment method. This will return pos for positive or neg for negative.
	This may also return neu for neutral. Neutral means no words were present in the featureset.
	For ex hindi words not present in featuresent.
	>>> import sentiment_mod
	>>> sentiment_mod.sentiment('test text for testing.')
	>>> pos #or neg
	___________________________________________________________

	1. Trainingset_Creator:
	This directory of no use rightnow. I used the review_crawler.py script insife this dirrectory to create
	training set for sentiment analysis. Now the sentment analysis models are alredy created this directory is not required.

	2. config.py: configuration for getting and setting the data out of the mongodb database.

	3. review_sentiment.py: Not used.

	4. review_sentiment_write.py: THIS Trains the classifiers and then PICKLES it in picle directory.

	5. review_sentiment_read.py: This flies code is similar to 'review_sentiment_write.py'. THIS FILE READS FROM
	THE ALREADY PICKLED FILES in pickle directory. FIRST RUN 'reviews_sentiment_write.py' and then RUN THIS ONLY
	TO CHECK ACCURACY FROM THE ALREADY PICKLED FILES. ALL THE CHNAGES MUST BE MADE TO THE 'reviews_sentiment_write.py'.

	Classifiers used:

	1. Naive Bayes Classifiers
	2. Neutral Support vector machine classifier,
	3. Linear Support vector machine classifier,
	4. Stochastic gradient descent Classifier,
	5. MultiNomial Nive Bayes classifier,
	6. Bernoulli Naive Bayes classifier,
	7. Logistic Regression classifier

	How algo works:
	I am using scikit-learn package of python for classification. All the algorithms rate the reviews and then lastly based
	rating with higher votes reviews are rated. For ex: if 1,2,3,7 classifier votes a apps review as positive and 4,5,6 rates
	it a negative it. Postive will be considered as the fina result because more than 50% votes for it.

	Note: Positive.txt, Negative.txt and pickle directory are missing from the github repo. These files are present on server.
	* pickle: this directory contains the serialized data for all the 7 algorithms. So, instead of traing the testing the data
	each time we want to test the sentiment of an apps review the pickled files are read adn loaded in memory.
	* positive.txt conatins the postive training reviews. I used this file to train the classifiers.
	* Negative.txt: same as abouve for negative training and testing data.