rhoit/ANSWERS.org

## ANSWERS.org

      
    Raw
  

              ANSWERS.org
            
          
    Assignment 2

1. Run the train/test script

# With Training Set
* Per label statistics
 0       Pr=1.00  Rc=1.00  F1=1.00
 1       Pr=0.98  Rc=0.96  F1=0.97

# With Test Set
* Per label statistics
 0       Pr=1.00  Rc=1.00  F1=1.00
 1       Pr=0.93  Rc=0.84  F1=0.88
# With Training Set
* Per label statistics
 0       Pr=1.00  Rc=1.00  F1=1.00
 1       Pr=0.97  Rc=0.93  F1=0.95

# With Test Set
* Per label statistics
 0       Pr=1.00  Rc=1.00  F1=1.00
 1       Pr=0.93  Rc=0.84  F1=0.90
2. Run the label_dev script

* Per label statistics
 0       Pr=1.00  Rc=1.00  F1=1.00
 1       Pr=0.98  Rc=0.94  F1=0.96
* Per label statistics
 0       Pr=1.00  Rc=1.00  F1=1.00
 1       Pr=0.98  Rc=0.97  F1=0.98
Error Analysis of dev set.

see file `a3diff.out` for output dump.

  Ranges were missed out
    i.e. 1:30 to 3:00
  Date follwed without am/pm and with venue/function were missed.
    i.e. 3:45, doherty hall 1112
    i.e. 3:30 \n place:

3. Adding up features to improve accuracy feature

A.
Improving tokenizer

  normalize upper/lower case
  others

Features

  AMPM in next token
  Comma in next token if current or next token is date or am/pm

4. Comparing Precision/Recall/F1 after adding feature.

Addition feature set has helped the performance in DEVTESTSET but has no effect DEVSET.
Feature Set

  Previous token
  Current token
  Current token DCD/NO
  Next token
  Next token AMPM/NO
    # With Training Set
* Per label statistics
 0       Pr=1.00  Rc=1.00  F1=1.00
 1       Pr=0.96  Rc=0.93  F1=0.95

# With Test Set
* Per label statistics
 0       Pr=1.00  Rc=1.00  F1=1.00
 1       Pr=0.95  Rc=0.89  F1=0.92
    
  
Running on devset, had no change
* Per label statistics
 0       Pr=1.00  Rc=1.00  F1=1.00
 1       Pr=0.98  Rc=0.97  F1=0.98