rongzhe Azure-rong

## textprocessing.py
#! /usr/bin/env python2.7
#coding=utf-8

"""
Read data from excel file and txt file.
Chinese word segmentation, postagger, sentence cutting function.

"""

import xlrd

## similarity feature.py
#! /usr/bin/env python2.7
#coding=utf-8

"""
Compute editorial review and product review similarity feature.

This module use gensim to build review tf-idf model and compute similarity of every review and a given txt.
So this module need a excel file contain all reviews and a txt file contain editorial review as input data.

"""

## centroid feature.py
#! /usr/bin/env python2.7
#coding=utf-8

"""
Compute review centroid score by combinating every word's tfidf score.
This module use filtered review data in a txt file and gensim tf-idf model to extract this review feature.

"""

import textprocessing as tp

## word sentence length feature.py
#! /usr/bin/env python2.7
#coding=utf-8

"""
Counting review's word number, sentence number and review length
This module aim to extract review's word number and sentence number and review length features.

"""

import textprocessing as tp

## name brand attribute feature.py
#! /usr/bin/env python2.7
#coding=utf-8

"""
Counting the product name, product brand and product attribute appear times in the review.
This module aim to extract product name, brand and attribute features.

"""

import textprocessing as tp

## entropy perplexity feature.py
#! /usr/bin/env python2.7
#coding=utf-8

"""
Compute review's entropy and perplexity.
This module aim to bulid review ngram language model then compute review entropy and perplexity as features

"""


## adj adv v feature.py
#! /usr/bin/env python2.7
#coding=utf-8

"""
Counting adjective words, adverbs and verbs number in the review.
This module aim to extract adjective words, adverbs and verbs number features.

"""


## pos neg(senti dict) feature.py
#! /usr/bin/env python2.7
#coding=utf-8

"""
Compute a review's positive and negative score, their average score and standard deviation.
This module aim to extract review positive/negative score, average score and standard deviation features (all 6 features).
Sentiment analysis based on sentiment dictionary.

"""

## store sentiment classifier.py
#! /usr/bin/env python2.7
#coding=utf-8

"""
Use positive and negative review set as corpus to train a sentiment classifier.
This module use labeled positive and negative reviews as training set, then use nltk scikit-learn api to do classification task.
Aim to train a classifier automatically identifiy review's positive or negative sentiment, and use the probability as review helpfulness feature.

"""

## pos neg(machine learning) feature.py
#! /usr/bin/env python2.7
#coding=utf-8

"""
Use a stored sentiment classifier to identifiy review positive and negative probability.
This module aim to extract review sentiment probability as review helpfulness features.

"""
	#! /usr/bin/env python2.7
	#coding=utf-8

	"""
	Read data from excel file and txt file.
	Chinese word segmentation, postagger, sentence cutting function.

	"""

	import xlrd
	#! /usr/bin/env python2.7
	#coding=utf-8

	"""
	Compute editorial review and product review similarity feature.

	This module use gensim to build review tf-idf model and compute similarity of every review and a given txt.
	So this module need a excel file contain all reviews and a txt file contain editorial review as input data.

	"""
	#! /usr/bin/env python2.7
	#coding=utf-8

	"""
	Compute review centroid score by combinating every word's tfidf score.
	This module use filtered review data in a txt file and gensim tf-idf model to extract this review feature.

	"""

	import textprocessing as tp
	#! /usr/bin/env python2.7
	#coding=utf-8

	"""
	Counting review's word number, sentence number and review length
	This module aim to extract review's word number and sentence number and review length features.

	"""

	import textprocessing as tp
	#! /usr/bin/env python2.7
	#coding=utf-8

	"""
	Counting the product name, product brand and product attribute appear times in the review.
	This module aim to extract product name, brand and attribute features.

	"""

	import textprocessing as tp
	#! /usr/bin/env python2.7
	#coding=utf-8

	"""
	Compute review's entropy and perplexity.
	This module aim to bulid review ngram language model then compute review entropy and perplexity as features

	"""
	#! /usr/bin/env python2.7
	#coding=utf-8

	"""
	Counting adjective words, adverbs and verbs number in the review.
	This module aim to extract adjective words, adverbs and verbs number features.

	"""
	#! /usr/bin/env python2.7
	#coding=utf-8

	"""
	Compute a review's positive and negative score, their average score and standard deviation.
	This module aim to extract review positive/negative score, average score and standard deviation features (all 6 features).
	Sentiment analysis based on sentiment dictionary.

	"""
	#! /usr/bin/env python2.7
	#coding=utf-8

	"""
	Use positive and negative review set as corpus to train a sentiment classifier.
	This module use labeled positive and negative reviews as training set, then use nltk scikit-learn api to do classification task.
	Aim to train a classifier automatically identifiy review's positive or negative sentiment, and use the probability as review helpfulness feature.

	"""
	#! /usr/bin/env python2.7
	#coding=utf-8

	"""
	Use a stored sentiment classifier to identifiy review positive and negative probability.
	This module aim to extract review sentiment probability as review helpfulness features.

	"""