This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python2.7 | |
#coding=utf-8 | |
""" | |
Read data from excel file and txt file. | |
Chinese word segmentation, postagger, sentence cutting function. | |
""" | |
import xlrd |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python2.7 | |
#coding=utf-8 | |
""" | |
Compute editorial review and product review similarity feature. | |
This module use gensim to build review tf-idf model and compute similarity of every review and a given txt. | |
So this module need a excel file contain all reviews and a txt file contain editorial review as input data. | |
""" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python2.7 | |
#coding=utf-8 | |
""" | |
Compute review centroid score by combinating every word's tfidf score. | |
This module use filtered review data in a txt file and gensim tf-idf model to extract this review feature. | |
""" | |
import textprocessing as tp |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python2.7 | |
#coding=utf-8 | |
""" | |
Counting review's word number, sentence number and review length | |
This module aim to extract review's word number and sentence number and review length features. | |
""" | |
import textprocessing as tp |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python2.7 | |
#coding=utf-8 | |
""" | |
Counting the product name, product brand and product attribute appear times in the review. | |
This module aim to extract product name, brand and attribute features. | |
""" | |
import textprocessing as tp |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python2.7 | |
#coding=utf-8 | |
""" | |
Compute review's entropy and perplexity. | |
This module aim to bulid review ngram language model then compute review entropy and perplexity as features | |
""" | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python2.7 | |
#coding=utf-8 | |
""" | |
Counting adjective words, adverbs and verbs number in the review. | |
This module aim to extract adjective words, adverbs and verbs number features. | |
""" | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python2.7 | |
#coding=utf-8 | |
""" | |
Compute a review's positive and negative score, their average score and standard deviation. | |
This module aim to extract review positive/negative score, average score and standard deviation features (all 6 features). | |
Sentiment analysis based on sentiment dictionary. | |
""" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python2.7 | |
#coding=utf-8 | |
""" | |
Use positive and negative review set as corpus to train a sentiment classifier. | |
This module use labeled positive and negative reviews as training set, then use nltk scikit-learn api to do classification task. | |
Aim to train a classifier automatically identifiy review's positive or negative sentiment, and use the probability as review helpfulness feature. | |
""" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python2.7 | |
#coding=utf-8 | |
""" | |
Use a stored sentiment classifier to identifiy review positive and negative probability. | |
This module aim to extract review sentiment probability as review helpfulness features. | |
""" | |
OlderNewer