Skip to content

Instantly share code, notes, and snippets.

@rhoit
Last active August 29, 2015 14:06
Show Gist options
  • Save rhoit/0a7b600c1520649259db to your computer and use it in GitHub Desktop.
Save rhoit/0a7b600c1520649259db to your computer and use it in GitHub Desktop.

Assignment 2

1. Run the train/test script

# With Training Set
* Per label statistics
 0       Pr=1.00  Rc=1.00  F1=1.00
 1       Pr=0.98  Rc=0.96  F1=0.97

# With Test Set
* Per label statistics
 0       Pr=1.00  Rc=1.00  F1=1.00
 1       Pr=0.93  Rc=0.84  F1=0.88
# With Training Set
* Per label statistics
 0       Pr=1.00  Rc=1.00  F1=1.00
 1       Pr=0.97  Rc=0.93  F1=0.95

# With Test Set
* Per label statistics
 0       Pr=1.00  Rc=1.00  F1=1.00
 1       Pr=0.93  Rc=0.84  F1=0.90

2. Run the label_dev script

* Per label statistics
 0       Pr=1.00  Rc=1.00  F1=1.00
 1       Pr=0.98  Rc=0.94  F1=0.96
* Per label statistics
 0       Pr=1.00  Rc=1.00  F1=1.00
 1       Pr=0.98  Rc=0.97  F1=0.98

Error Analysis of dev set.

see file `a3diff.out` for output dump.

  • Ranges were missed out i.e. 1:30 to 3:00
  • Date follwed without am/pm and with venue/function were missed. i.e. 3:45, doherty hall 1112 i.e. 3:30 \n place:

3. Adding up features to improve accuracy feature

A.

Improving tokenizer

  • normalize upper/lower case
  • others

Features

  • AMPM in next token
  • Comma in next token if current or next token is date or am/pm

4. Comparing Precision/Recall/F1 after adding feature.

Addition feature set has helped the performance in DEVTESTSET but has no effect DEVSET.

Feature Set

  1. Previous token
  2. Current token
  3. Current token DCD/NO
  4. Next token
  5. Next token AMPM/NO
    # With Training Set
    * Per label statistics
     0       Pr=1.00  Rc=1.00  F1=1.00
     1       Pr=0.96  Rc=0.93  F1=0.95
    
    # With Test Set
    * Per label statistics
     0       Pr=1.00  Rc=1.00  F1=1.00
     1       Pr=0.95  Rc=0.89  F1=0.92
        

Running on devset, had no change

* Per label statistics
 0       Pr=1.00  Rc=1.00  F1=1.00
 1       Pr=0.98  Rc=0.97  F1=0.98
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment