- CRFsuite - Documentation
- ★: useful commands on error analysis
-
data:
train
,dev
,test
- list of (NE, Token
'w'
, POS'pos'
, Chunk'chk'
)
O -DOCSTART- -X- O B-ORG EU NNP B-NP O rejects VBZ B-VP B-MISC German JJ B-NP O call NN I-NP O to TO B-VP O boycott VB I-VP B-MISC British JJ B-NP O lamb NN I-NP ...
- ↓ feature extraction:
$ ./feature.py < train > train.f
, ...
- list of (NE, Token
-
feature file:
train.f
,dev.f
,test.f
O w[0]=-DOCSTART- pos[0]=-X- chk[0]=O p4[0]=-DOC p5[0]=-DOCS ... O w[0]=CRICKET w[1]=- w[0]|w[1]=CRICKET|- w[1]|w[2]=-|LEICESTERSHIRE pos[0]=NNP pos[1]=__COLON__ pos[0]|pos[1]=NNP|__COLON__ chk[0]=B-NP p4[0]=CRIC p5[0]=CRICK ... O w[-1]=CRICKET w[0]=- w[1]=LEICESTERSHIRE w[0]|w[1]=-|LEICESTERSHIRE w[1]|w[2]=LEICESTERSHIRE|TAKE pos[-1]=NNP pos[0]=__COLON__ pos[1]=NNP pos[0]|pos[1]=__COLON__|NNP chk[0]=O p4[0]=False p5[0]=False ... B-ORG w[-1]=- w[0]=LEICESTERSHIRE w[1]=TAKE w[0]|w[1]=LEICESTERSHIRE|TAKE w[1]|w[2]=TAKE|OVER pos[-1]=__COLON__ pos[0]=NNP pos[1]=NNP pos[0]|pos[1]=NNP|NNP ... ...
- ↓ training:
$ crfsuite learn -a ap -p max_iterations=20 -m ner.model train.f
- ↓ training:
-
model:
ner.model
-
→ ★ dump (check weights):
$ crfsuite dump ner.model > ner.dump
... TRANSITIONS = { (1) O --> O: 29.006869 (1) O --> B-ORG: 22.085804 (1) O --> B-MISC: 22.575587 (1) O --> B-PER: 32.707861 ... STATE_FEATURES = { (0) w[0]=-DOCSTART- --> O: 8.808922 (0) pos[0]=-X- --> O: 8.808922 (0) chk[0]=O --> O: 0.113915 (0) chk[0]=O --> B-ORG: -7.506798
-
↓ test (tagging, prediction):
$ crfsuite tag -r -m ner.model < dev.f > dev.eval
-i
: add marginal probability
-
-
(gold, predicted value):
dev.eval
,test.eval
O O O O O O B-ORG B-PER O O O O ...
-
→ evaluation:
$ conlleval.py < dev.eval
processed 51578 tokens with 5943 phrases; found: 5906 phrases; correct: ... accuracy: ...
-
→ error analysis:
$ crfsuite tag -m ner.model -r < test.f | merge.py test > test.error
-
→ ★ error analysis (sentences including false instances only):
$ crfsuite tag -m ner.model -r < test.f | merge.py test | false_instance.py > test.error
-
- ★: useful commands
$ less FILE
$ some commands | less
(piping)
/
: search forward for a pattern ★?
: search backward for a patternn
: next (repeat previous search) ★N
: previous (repeat previous search, but in the reverse direction)
- line ★
-
j
,Ctrl+n
,Enter
,↓
-
k
,Ctrl+p
,↑
-
- window ★
f
,Ctrl+v
,Space
b
,Alt+v
- half window
Ctrl+d
Ctrl+u
- file ★
g
G
(Shift+g
)
ma
: mark the current position with the letter 'a''a
: go to the marked position 'a'q
,ZZ
: exit ★v
: open the file with the default editor