Skip to content

Instantly share code, notes, and snippets.

@bittlingmayer
Last active May 29, 2017 11:31
Show Gist options
  • Save bittlingmayer/a276720c513a051737464855666eca12 to your computer and use it in GitHub Desktop.
Save bittlingmayer/a276720c513a051737464855666eca12 to your computer and use it in GitHub Desktop.
Amazon Reviews Sentiment with fastTest [example]

This code uses fastText supervised learning to predict output labels from input text.

Approach

This is the baseline code. I have not changed anything.

Preprocessing

I applied lowercasing, so "This is a TEST!" becomes "this is a test!".

Parameters

This is the baseline code. I have not added any hyperparameters in sentiment.sh.

Results

Precision and recall were 0.916.

Data

Before running, you should download the data to a directory like data/amazon_reviews.

You can download the datasets from kaggle.com/bittlingmayer/amazonreviews.

So the directory structure will look like this:

├── README.md
├── preproc.sh  
├── sentiment.sh  
└── data/ 
    └── amazon_reviews/ 
        ├── train.ft.txt  
        └── test.ft.txt  

The code assumes that the data files are in the fastText format.

The code assumes that ./fastext will work.

To Run

To preprocess:

sh preproc.sh data/amazon_reviews/train.ft.txt > data/amazon_reviews/train.ft.txt
sh preproc.sh data/amazon_reviews/train.ft.txt > data/amazon_reviews/test.ft.txt

To evaluation predictions:

sh sentiment.sh data/amazon_reviews/train.ft.txt data/amazon_reviews/test.ft.txt
tr '[:upper:]' '[:lower:]' < "$1"
./fasttext supervised -input "$1" -output model_amzn
./fasttext test model_amzn.bin "$2"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment