This code uses fastText supervised learning to predict output labels from input text.
This is the baseline code. I have not changed anything.
I applied lowercasing, so "This is a TEST!" becomes "this is a test!".
This is the baseline code. I have not added any hyperparameters in sentiment.sh.
Precision and recall were 0.916.
Before running, you should download the data to a directory like data/amazon_reviews
.
You can download the datasets from kaggle.com/bittlingmayer/amazonreviews.
So the directory structure will look like this:
├── README.md
├── preproc.sh
├── sentiment.sh
└── data/
└── amazon_reviews/
├── train.ft.txt
└── test.ft.txt
The code assumes that the data files are in the fastText format.
The code assumes that ./fastext
will work.
To preprocess:
sh preproc.sh data/amazon_reviews/train.ft.txt > data/amazon_reviews/train.ft.txt
sh preproc.sh data/amazon_reviews/train.ft.txt > data/amazon_reviews/test.ft.txt
To evaluation predictions:
sh sentiment.sh data/amazon_reviews/train.ft.txt data/amazon_reviews/test.ft.txt