# With Training Set
* Per label statistics
0 Pr=1.00 Rc=1.00 F1=1.00
1 Pr=0.98 Rc=0.96 F1=0.97
# With Test Set
* Per label statistics
0 Pr=1.00 Rc=1.00 F1=1.00
1 Pr=0.93 Rc=0.84 F1=0.88
# With Training Set
* Per label statistics
0 Pr=1.00 Rc=1.00 F1=1.00
1 Pr=0.97 Rc=0.93 F1=0.95
# With Test Set
* Per label statistics
0 Pr=1.00 Rc=1.00 F1=1.00
1 Pr=0.93 Rc=0.84 F1=0.90
* Per label statistics
0 Pr=1.00 Rc=1.00 F1=1.00
1 Pr=0.98 Rc=0.94 F1=0.96
* Per label statistics
0 Pr=1.00 Rc=1.00 F1=1.00
1 Pr=0.98 Rc=0.97 F1=0.98
see file `a3diff.out` for output dump.
- Ranges were missed out i.e. 1:30 to 3:00
- Date follwed without am/pm and with venue/function were missed. i.e. 3:45, doherty hall 1112 i.e. 3:30 \n place:
A.
Improving tokenizer
- normalize upper/lower case
- others
Features
- AMPM in next token
- Comma in next token if current or next token is date or am/pm
Addition feature set has helped the performance in DEVTESTSET but has no effect DEVSET.
Feature Set
- Previous token
- Current token
- Current token DCD/NO
- Next token
- Next token AMPM/NO
# With Training Set * Per label statistics 0 Pr=1.00 Rc=1.00 F1=1.00 1 Pr=0.96 Rc=0.93 F1=0.95 # With Test Set * Per label statistics 0 Pr=1.00 Rc=1.00 F1=1.00 1 Pr=0.95 Rc=0.89 F1=0.92
Running on devset, had no change
* Per label statistics
0 Pr=1.00 Rc=1.00 F1=1.00
1 Pr=0.98 Rc=0.97 F1=0.98