bittlingmayer/README.md

## README.md

      
    Raw
  

              README.md
            
          
    This code uses fastText supervised learning to predict output labels from input text.
Approach

This is the baseline code.  I have not changed anything.
Preprocessing

I applied lowercasing, so "This is a TEST!" becomes "this is a test!".
Parameters

This is the baseline code.  I have not added any hyperparameters in sentiment.sh.
Results

Precision and recall were 0.916.
Data

Before running, you should download the data to a directory like data/amazon_reviews.
You can download the datasets from kaggle.com/bittlingmayer/amazonreviews.
So the directory structure will look like this:
├── README.md
├── preproc.sh  
├── sentiment.sh  
└── data/ 
    └── amazon_reviews/ 
        ├── train.ft.txt  
        └── test.ft.txt  

The code assumes that the data files are in the fastText format.
The code assumes that ./fastext will work.
To Run

To preprocess:
sh preproc.sh data/amazon_reviews/train.ft.txt > data/amazon_reviews/train.ft.txt
sh preproc.sh data/amazon_reviews/train.ft.txt > data/amazon_reviews/test.ft.txt

To evaluation predictions:
sh sentiment.sh data/amazon_reviews/train.ft.txt data/amazon_reviews/test.ft.txt


## preproc.sh
tr '[:upper:]' '[:lower:]' < "$1"

## sentiment.sh
./fasttext supervised -input "$1" -output model_amzn

./fasttext test model_amzn.bin "$2"
	./fasttext supervised -input "$1" -output model_amzn

	./fasttext test model_amzn.bin "$2"