Skip to content

Instantly share code, notes, and snippets.

from math import floor
import nltk
from nltk.corpus import sentiwordnet as swn
from nltk.tag.perceptron import PerceptronTagger
from sklearn.metrics import accuracy_score
from sklearn.pipeline import Pipeline
from utilities import DataClean
from utilities import load_data, cross_validate
tagger = PerceptronTagger()
from sklearn.cross_validation import StratifiedKFold
from pandas import read_csv
import numpy as np
from sklearn.metrics import confusion_matrix
from bs4 import BeautifulSoup
import re
from nltk.corpus import stopwords
class DataClean:

Introduction

Hello guys! I am going to walk you through my implementation of Sentiwordnet 3.0 on movie reviws to find the overall sentiment of ech review. I have mentioned the datasets and more about Sentiwordnet below. I will be using python 2.7 for coding. Also a few of its libraries like pandas, sklearn and nltk. NLTK has inbuilt modules for Sentiwordnet and Pos Tagger which will also be used in our code. So let's get started !

SentiWordnet

SentiWordNet is a lexical resource for opinion mining. SentiWordNet assigns to each synset of WordNet three sentiment scores: positivity, negativity, objectivity. Each of the three scores ranges in the interval [0.0 ; 1.0], and their sum is 1.0 for each synset.

Sentiwordnet was designed by ranking subjectivity of all terms or synsets according to the part of speech the term belongs to. The parts of speech represented by the sentiwordnet are adjective, noun, adverb and verb which are represented respectively as 'a', 'n', 'r', 'v'. the database has five col