http://www.chokkan.org/software/crfsuite/manual.html
ç®æ¬¡
ãã®ã»ã¯ã·ã§ã³ã§ã¯ãCRFsuiteããã¬ãŒãã³ã°ããã³ã¿ã°ä»ãã«äœ¿çšããããŒã¿åœ¢åŒã«ã€ããŠèª¬æããŸãã ããŒã¿ã¯äžé£ã®é
ç®ã·ãŒã±ã³ã¹ã§æ§æãããåé
ç®ã·ãŒã±ã³ã¹ã¯é£ç¶ããè¡ã§è¡šããã空ã®è¡ã§çµãããŸãã ã¢ã€ãã ã·ãŒã±ã³ã¹ã¯ããã®ç¹æ§ïŒã©ãã«ããã³å±æ§ïŒãã©ã€ã³ã§èšè¿°ãããäžé£ã®ã¢ã€ãã ã§æ§æãããŸãã é
ç®è¡ã¯ã©ãã«ã§å§ãŸããå±æ§ã¯TABïŒ\t
ïŒæåã§åºåãããŸãã
ããã¯ãã¬ãŒãã³ã°ããŒã¿ã®äŸã§ãïŒCoNLL 2000ãã£ã³ã¯å ±æã¿ã¹ã¯ããååŸïŒã
http://www.chokkan.org/software/crfsuite/data_sample.png å³1. CRFsuiteã®ãµã³ãã«ããŒã¿
ãã®äŸã«ã¯ã4ã€ã®ã¢ã€ãã ã·ãŒã±ã³ã¹ãå«ãŸããŠããŸãïŒæåŸã®ãã®ã¯éšåçã«ç€ºãããŠããŸãïŒã第1ã®ã·ãŒã±ã³ã¹ã®ç¬¬1ã®é
ç®ã¯ãã©ãã« B-NP
ã§æ³šéãããw[0]=An
ãw[1]=AP
ãpos[0]=DT
ã pos[1]= NNP
ã __ BOS__
ã§ãããã®äŸã®ã©ãã«ãšå±æ§ã¯ãç¹å®ã®åœåèŠåïŒãã£ãŒãã£ãã¶ã€ã³ïŒã«åŸããŸããB-NP
ã¯çŸåšã®ããŒã¯ã³ãåè©å¥ã®å§ãŸãã§ããããšã瀺ããw[0]=An
ã¯ãçŸåšã®ã¢ã€ãã ã®è¡šé¢åœ¢æ
ããAnãã§ããããšã瀺ããpos[1]=NNP
ã¯æ¬¡ã®ããŒã¯ã³ãåºæåè©ã§ããããšã瀺ãã __ BOS__
ã¯çŸåšã®ã¢ã€ãã ãã·ãŒã±ã³ã¹ã®æåã®ã¢ã€ãã ã§ããããšã瀺ããŸããããããCRFsuiteã¯ã©ãã«ãå±æ§ã®åœåèŠåãæ©èœèšèšã«ã¯é¢å¿ããããŸããããåãªãæååãšããŠæ±ããŸãã CRFsuiteã¯ãã©ãã«ãšå±æ§ã®æå³ãç¥ããªããŠããå±æ§ãšã©ãã«ã®é¢é£æ§ïŒç¹åŸŽã®éã¿ïŒãåŠç¿ããŸãïŒäŸãã°ãçŸåšã®ã¢ã€ãã ãå±æ§ pos[0]=DT
ãæããå Žåãæå³ã®ããããªãã©ãã« B-NP
ãæããå¯èœæ§ãé«ãïŒãã€ãŸããã©ãã«ãå±æ§åãããŒã¿ã»ããã«æžã蟌ãã ãã§ä»»æã®æ©èœãèšèšããŠäœ¿çšããããšãã§ããŸãã
å±æ§ã¯ãã³ãã³æåïŒïŒ
ïŒã§åºåãããã¹ã±ãŒãªã³ã°å€ãæã€ããšãã§ããŸããæ£åŒã«ã¯ãç¹åŸŽã®åœ±é¿éã¯ã察å¿ããå±æ§ã®ã¹ã±ãŒãªã³ã°å€ã«ãã£ãŒãã£ãŠã§ã€ããä¹ããŠæ±ºå®ãããŸãã倧ãŸãã«èšãã°ãå±æ§ã®ã¹ã±ãŒãªã³ã°å€ã¯ãå±æ§ã®åºçŸé »åºŠãšåæ§ã®å¹æãæã¡ãŸãããå°æ°ç¹ãŸãã¯æ¡éãã«ããããšãã§ããŸããã¹ã±ãŒãªã³ã°å€ã倧ãããšããã¬ãŒãã³ã°ã§ãªãŒããŒãããŒïŒã¬ã³ãžãšã©ãŒïŒãçºçããå¯èœæ§ãããããšã«æ³šæããŠãã ãããã³ãã³æåã¯ããŒã¿ã»ããã§ç¹å¥ãªåœ¹å²ãæã€ãããCRFsuiteã¯ãšã¹ã±ãŒãã·ãŒã±ã³ã¹ã䜿çšããŸãã ã\ïŒ
ãããã³ã\\
ãã¯ãããããå±æ§åã®ãïŒ
ãããã³ã\
ããè¡šããŸããå±æ§å€ãçç¥ãããå ŽåïŒã³ãã³æåãªãïŒãCRFsuiteã¯ã¹ã±ãŒãªã³ã°å€ã 1
ãšã¿ãªããŸããããšãã°ããããã®3ã€ã®é
ç®ã¯ãå±æ§ãšã¹ã±ãŒãªã³ã°å€ã®ç¹ã§åãã§ãã
B-NP w[1..4]=a:2 w[1..4]=man w[1..4]=eats
B-NP w[1..4]=a w[1..4]=a w[1..4]=man w[1..4]=eats
B-NP w[1..4]=a:2.0 w[1..4]=man:1.0 w[1..4]=eats:1.0
ã¿ã°ä»ãã®ããŒã¿åœ¢åŒã¯ãåŠç¿çšã®ããŒã¿åœ¢åŒãšãŸã£ããåãã§ãããã¿ã°ä»ãããŒã¿å ã®ã©ãã«ã¯ç©ºã«ããããšãã§ããŸãïŒãã ããçç¥ããããšã¯ã§ããŸããïŒã ã¿ã°ä»ãã®å ŽåãCRFsuiteã¯å ¥åããŒã¿å ã®ã©ãã«ãç¡èŠãããããŸãã¯äºæž¬ã®ããã©ãŒãã³ã¹ã枬å®ããããã«ãããã䜿çšããŸãã
ããã¯ãããŒã¿åœ¢åŒãè¡šãBNFèšæ³ã§ãã
<line> ::= <item> | <eos>
<item> ::= <label> ('\t' <attribute>)+ <br>
<eos> ::= <br>
<label> ::= <string>
<attribute> ::= <name> | <name> ':' <scaling>
<name> ::= (<letter> | "\:" | "\\")+
<scaling> ::= <numeric>
<br> ::= '\n'
CRFsuiteãã€ã³ã¹ããŒã«ããæãç°¡åãªæ¹æ³ã¯ããã€ããªé åžã䜿çšããããšã§ããçŸåšãWin32ããã³LinuxïŒIntel 32ãããããã³64ãããã¢ãŒããã¯ãã£ïŒã®ãã€ããªãé åžãããŠããŸãã
CRFsuite 0.5以éããœãŒã¹ããã±ãŒãžã«ã¯libLBFGSã®éšåãå«ãŸããªããªããŸããã CRFsuiteããã«ãããã«ã¯ããŸãlibLBFGSãããŠã³ããŒãããŠãã«ãããå¿ èŠããããŸãã
Windowsç°å¢ã§ã¯ãlibLBFGSã®Visual Studioãœãªã¥ãŒã·ã§ã³ãã¡ã€ã«ïŒlbfgs.slnïŒãéããŠãã«ãããŸãããœãªã¥ãŒã·ã§ã³ãã¡ã€ã«ã¯ãReleaseãŸãã¯Debugãã£ã¬ã¯ããªã«ã¹ã¿ãã£ãã¯ãªã³ã¯ã©ã€ãã©ãªlbfgs.libïŒãªãªãŒã¹ãã«ãïŒãŸãã¯lbfgs_debug.libïŒãããã°ãã«ãïŒããã«ãããŸãã CRFsuiteã®ãœãªã¥ãŒã·ã§ã³ãã¡ã€ã«ïŒcrfsuite.slnïŒã¯ãlibLBFGSã®ããããã¡ã€ã«ãšã©ã€ãã©ãªãã¡ã€ã«ãwin32 / lbfgsãã£ã¬ã¯ããªã«ååšããããšãåæãšããŠããããããã®ãã£ã¬ã¯ããªãäœæããlbfgs.hãlbfgs.libãããã³/ãŸãã¯lbfgs_debug.libããã£ã¬ã¯ããªã«ã³ããŒããŸãã次ã«ããœãªã¥ãŒã·ã§ã³ãã¡ã€ã«ïŒcrfsuite.slnïŒãéããŠãã«ãããŸãã
Linuxç°å¢ã§ã¯ãlibLBFGSã®ãœãŒã¹ããã±ãŒãžãããŠã³ããŒãããŠãã«ãããŸããã䜿çšã®ãªãã¬ãŒãã£ã³ã°ã·ã¹ãã ã«libLBFGSãã€ã³ã¹ããŒã«ããªãå Žåã¯ãconfigureã¹ã¯ãªããã« " - prefix"ãªãã·ã§ã³ãæå®ããŠãã ããããã®äŸã§ã¯ãããŒã ãã£ã¬ã¯ããªïŒ$ HOMEïŒã®äžã®localãã£ã¬ã¯ããªã«libLBFGSãã€ã³ã¹ããŒã«ããŸãã
$ ./configure --prefix=$HOME/local
$ make
$ make install
CRFsuiteãäœæããæºåãæŽããŸããã libLFGSãå¥ã®ãã£ã¬ã¯ããªã«ã€ã³ã¹ããŒã«ããŠããå Žåã¯ã "--with-liblbfgs"ãªãã·ã§ã³ã®åŒæ°ã«ãã£ã¬ã¯ããªãæå®ããŠãã ããã
$ ./configure --prefix=$HOME/local --with-liblbfgs=$HOME/local
$ make
$ make install
CRFsuiteãŠãŒãã£ãªãã£ã¯ãæåã®ã³ãã³ãã©ã€ã³åŒæ°ãã³ãã³ãåã§ããããšãæ³å®ããŠããŸãã
- åŠã¶
- ãã¬ãŒãã³ã°ã»ããããCRFã¢ãã«ããã¬ãŒãã³ã°ããã
- ã¿ã°
- CRFã¢ãã«ãçšããŠã¿ã°é åãã¿ã°ããã
- ãã³ã
- CRFã¢ãã«ããã¬ãŒã³ããã¹ã圢åŒã§ãã³ãããŸãã
ã³ãã³ãã©ã€ã³æ§æã衚瀺ããã«ã¯ã-hïŒ--helpïŒãªãã·ã§ã³ã䜿çšããŸãã
$ crfsuite -h
CRFSuite 0.12 Copyright (c) 2007-2011 Naoaki Okazaki
USAGE: crfsuite <COMMAND> [OPTIONS]
COMMAND Command name to specify the processing
OPTIONS Arguments for the command (optional; command-specific)
COMMAND:
learn Obtain a model from a training set of instances
tag Assign suitable labels to given instances by using a model
dump Output a model in a plain-text format
For the usage of each command, specify -h option in the command argument.
ãã¬ãŒãã³ã°ã»ããããCRFã¢ãã«ããã¬ãŒãã³ã°ããã«ã¯ã次ã®ã³ãã³ããå ¥åããŸãã
$ crfsuite learn [OPTIONS] [DATA]
åŒæ°DATAãçç¥ãããå ŽåããŸã㯠' - 'ã®å Žåããã®ãŠãŒãã£ãªãã£ã¯STDINãããã¬ãŒãã³ã°ããŒã¿ãèªã¿èŸŒã¿ãŸãã learnã³ãã³ãã®äœ¿çšæ³ã衚瀺ããã«ã¯ã-hïŒ--helpïŒãªãã·ã§ã³ãæå®ããŸãã
$ crfsuite learn -h
CRFSuite 0.12 Copyright (c) 2007-2011 Naoaki Okazaki
USAGE: crfsuite learn [OPTIONS] [DATA1] [DATA2] ...
Trains a model using training data set(s).
DATA file(s) corresponding to data set(s) for training; if multiple N files
are specified, this utility assigns a group number (1...N) to the
instances in each file; if a file name is '-', the utility reads a
data set from STDIN
OPTIONS:
-t, --type=TYPE specify a graphical model (DEFAULT='1d'):
(this option is reserved for the future use)
1d 1st-order Markov CRF with state and transition
features; transition features are not conditioned
on observations
-a, --algorithm=NAME specify a training algorithm (DEFAULT='lbfgs')
lbfgs L-BFGS with L1/L2 regularization
l2sgd SGD with L2-regularization
ap Averaged Perceptron
pa Passive Aggressive
arow Adaptive Regularization of Weights (AROW)
-p, --set=NAME=VALUE set the algorithm-specific parameter NAME to VALUE;
use '-H' or '--help-parameters' with the algorithm name
specified by '-a' or '--algorithm' and the graphical
model specified by '-t' or '--type' to see the list of
algorithm-specific parameters
-m, --model=FILE store the model to FILE (DEFAULT=''); if the value is
empty, this utility does not store the model
-g, --split=N split the instances into N groups; this option is
useful for holdout evaluation and cross validation
-e, --holdout=M use the M-th data for holdout evaluation and the rest
for training
-x, --cross-validate repeat holdout evaluations for #i in {1, ..., N} groups
(N-fold cross validation)
-l, --log-to-file write the training log to a file instead of to STDOUT;
The filename is determined automatically by the training
algorithm, parameters, and source files
-L, --logbase=BASE set the base name for a log file (used with -l option)
-h, --help show the usage of this command and exit
-H, --help-parameters show the help message of algorithm-specific parameters;
specify an algorithm with '-a' or '--algorithm' option,
and specify a graphical model with '-t' or '--type' option
ãã¬ãŒãã³ã°ã«ã¯ä»¥äžã®ãªãã·ã§ã³ããããŸãã
-tã--type=TYPE
ãã£ãŒãã£çæã«äœ¿çšããã°ã©ãã£ã«ã«ã¢ãã«ãæå®ããŸãã ããã©ã«ãå€ã¯ "1d"ã§ãã1d
ç¶æ ãšé·ç§»ã®ç¹åŸŽãæã€1次ãã«ã³ãCRFïŒãã€ã¢ãæ©èœïŒã ç¶æ ã®ç¹åŸŽã¯å±æ§ãšã©ãã«ã®çµã¿åãããæ¡ä»¶ãšããé·ç§»ç¹åŸŽã¯ã©ãã«ã®ãã€ã°ã©ã ã«æ¡ä»¶ä»ããããŸãã
-aã--algorithm=NAME
ãã¬ãŒãã³ã°ã¢ã«ãŽãªãºã ãæå®ããŸãã ããã©ã«ãå€ã¯ "lbfgs"ã§ããlbfgs
L-BFGSæ³ã«ããåŸé éäžl2sgd
L2æ£èŠåé ã䌎ã確ççåŸé éäžap
å¹³åããŒã»ãããã³PA
ããã·ãã¢ã°ã¬ãã·ãïŒPAïŒarow
éã¿ãã¯ãã«ïŒAROWïŒã®é©å¿æ£èŠå
-pã--param=NAME=VALUE
ãã¬ãŒãã³ã°ã®ãã©ã¡ãŒã¿ãèšå®ããŸãã CRFsuiteã¯ããã©ã¡ãŒã¿ïŒNAMEïŒãVALUEã«èšå®ããŸããå©çšå¯èœãªãã©ã¡ãŒã¿ã¯ãéžæãããã°ã©ãã£ã«ã«ã¢ãã«ããã³ãã¬ãŒãã³ã°ã¢ã«ãŽãªãºã ã«äŸåããã䜿çšå¯èœãªãã©ã¡ãŒã¿ã®ãã«ãã¡ãã»ãŒãžã衚瀺ããã«ã¯ã '-a'ãŸã㯠'--algorithm'ã§æå®ãããã¢ã«ãŽãªãºã åãš '-t'ãŸã㯠'--algorithm'ã§æå®ãããã°ã©ãã£ã«ã«ã¢ãã«ã§ '-H'ãŸã㯠'--help- - ã¿ã€ã'ã-mã--model=MODEL
èšç·Žãããã¢ãã«ãMODELãã¡ã€ã«ã«æ ŒçŽããŸããããã©ã«ãå€ã¯ ""ïŒç©ºïŒã§ãã MODELã空ã®å ŽåãCRFsuiteã¯ã¢ãã«ããã¡ã€ã«ã«ä¿åããŸããã-gã--split=N
ã€ã³ã¹ã¿ã³ã¹ãNåã®ã°ã«ãŒãã«åå²ãã{1ã...ãN}ã®çªå·ãåã°ã«ãŒãã«å²ãåœãŠãŸãããã®ãªãã·ã§ã³ã¯äž»ã«Nåã®ã¯ãã¹ããªããŒã·ã§ã³ïŒ-xãªãã·ã§ã³ä»ãïŒãå®è¡ããããã«äœ¿çšãããŸããããã©ã«ãã§ã¯ãCRFsuiteã¯å ¥åããŒã¿ãã°ã«ãŒãã«åå²ããŸããã-eã--holdout=M
ä¿çè©äŸ¡ã«ã¯ã°ã«ãŒãçªå·Mã®ã€ã³ã¹ã¿ã³ã¹ã䜿çšããŸãã CRFsuiteã¯ãã°ã«ãŒãçªå·Mã®ã€ã³ã¹ã¿ã³ã¹ããã¬ãŒãã³ã°ã«äœ¿çšããŸãããããã©ã«ãã§ã¯ãCRFsuiteã¯ä¿çè©äŸ¡ãå®è¡ããŸããã-xã--cross-validate
Nå亀差æ€èšŒãå®è¡ããŸãã -gãªãã·ã§ã³ã䜿çšããŠåå²æ°ãæå®ããŸããããã©ã«ãã§ã¯ãCRFsuiteã¯ã¯ãã¹ããªããŒã·ã§ã³ãå®è¡ããŸããã-lã--log-to-file
ãã¬ãŒãã³ã°ã®ãã°ã¡ãã»ãŒãžããã¡ã€ã«ã«æžãåºããŸãããã¡ã€ã«åã¯ãã³ãã³ãã©ã€ã³åŒæ°ïŒãã¬ãŒãã³ã°ã¢ã«ãŽãªãºã ãã°ã©ãã£ã«ã«ã¢ãã«ããã©ã¡ãŒã¿ããœãŒã¹ãã¡ã€ã«ãªã©ïŒããèªåçã«æ±ºå®ãããŸããããã©ã«ãã§ã¯ãCRFsuiteã¯ãã°ã¡ãã»ãŒãžãSTDOUTã«æžã蟌ã¿ãŸãã-Lã--logbase=BASE
ãã°ãã¡ã€ã«ã®ããŒã¹åãæå®ããŸãïŒ-lãªãã·ã§ã³ãšãšãã«äœ¿çšããŸãïŒã ããã©ã«ãã§ã¯ãããŒã¹å㯠"log.crfsuite"ã§ãã-hã--help
ãã®ã³ãã³ãã®äœ¿çšæ³ã衚瀺ããŠçµäºããŸãã-Hã--help-parameters
ãã©ã¡ãŒã¿ãšãã®èª¬æã®ãªã¹ãã衚瀺ããŸãã -tããã³-aãªãã·ã§ã³ã䜿çšããŠãã°ã©ãã£ã«ã«ã¢ãã«ãšãã¬ãŒãã³ã°ã¢ã«ãŽãªãºã ãããããæå®ããŸãã-pã--param=NAME=VALUE
ãã¬ãŒãã³ã°ã®ãã©ã¡ãŒã¿ãèšå®ããŸãã CRFsuiteã¯ããã©ã¡ãŒã¿ïŒNAMEïŒãVALUEã«èšå®ããŸãã ãã©ã¡ãŒã¿ãšãã®èª¬æã®ãªã¹ãã衚瀺ããã«ã¯ã-HïŒ--help-parametersïŒãªãã·ã§ã³ã䜿çšããŸãã
ãã¬ãŒãã³ã°ã®ããã®CRFsuiteã³ãã³ãã©ã€ã³ã®ããã€ãã®äŸã以äžã«ç€ºããŸãã
train.txtã®CRFã¢ãã«ãããã©ã«ãã®ãã©ã¡ãŒã¿ã§ãã¬ãŒãã³ã°ããã¢ãã«ãCRF.modelã«ä¿åããŸãã
$ crfsuite learn -m CRF.model train.txt
STDINã®CRFã¢ãã«ãããã©ã«ãã®ãã©ã¡ãŒã¿ã§ãã¬ãŒãã³ã°ããŸãã
$ cat train.txt | crfsuite learn -
train.txtïŒã°ã«ãŒãïŒ1ïŒããCRFã¢ãã«ããã¬ãŒãã³ã°ããŸãã èšç·Žäžã«ãããŒã«ãã¢ãŠãããŒã¿test.txtïŒã°ã«ãŒãïŒ2ïŒã§ã¢ãã«ããã¹ãããŸãã
$ crfsuite learn -e2 train.txt test.txt
ãã¬ãŒãã³ã°ããŒã¿train.txtã§10åã®ã¯ãã¹ããªããŒã·ã§ã³ãå®è¡ããŸãã ãã°åºåã¯log.crfsuite_lbfgsã«æ ŒçŽãããŸãïŒãã°ãã¡ã€ã«ã®ååã¯ããã¬ãŒãã³ã°ãã©ã¡ãŒã¿ã«ãã£ãŠç°ãªãå ŽåããããŸãïŒã
$ crfsuite learn -g10 -x -l train.txt
ç¶æ ãšé·ç§»ã®ç¹åŸŽãæã€1次ãã«ã³ãCRFïŒãã€ã¢ãæ©èœïŒãç¶æ ã®ç¹åŸŽã¯å±æ§ãšã©ãã«ã®çµã¿åãããæ¡ä»¶ãšããé·ç§»ç¹åŸŽã¯ã©ãã«ã®ãã€ã°ã©ã ã«æ¡ä»¶ä»ããããŸãã
feature.minfreq=VALUE
ãã£ãŒãã£ã®çºçé »åºŠã®ã«ãããªããããå€ã CRFsuiteã¯ãèšç·ŽããŒã¿äžã®åºçŸé »åºŠãVALUEãã倧ãããªãç¹åŸŽãç¡èŠãããããã©ã«ãå€ã¯0ïŒã€ãŸããã«ãããªããªãïŒã§ããfeature.possible_states=BOOL
CRFsuiteãèšç·ŽããŒã¿å ã«ååšããªãç¶æ ç¹åŸŽïŒããªãã¡ãè² ã®ç¶æ ã®ç¹åŸŽïŒãçæãããã©ãããæå®ããã BOOLã1ã«èšå®ãããšãCRFsuiteã¯å±æ§ãšã©ãã«ã®éã«èãããããã¹ãŠã®çµã¿åãããé¢é£ä»ããç¶æ æ©èœãçæããŸããå±æ§ãšã©ãã«ã®æ°ãããããAãšLãšãããšããã®é¢æ°ã¯ïŒA * LïŒåã®ç¹åŸŽãçæããŸãããã®æ©èœãæå¹ã«ãããšãCRFã¢ãã«ã§é ç®ãåç §ã©ãã«ã«äºæž¬ãããªãç¶æ ãç¥ãããšãã§ãããããã©ãã«ä»ãã®ç²ŸåºŠãåäžããå¯èœæ§ããããŸãããããããã®æ©èœã¯ããã£ãŒãã£ã®æ°ãå¢ããããã¬ãŒãã³ã°ããã»ã¹ãå€§å¹ ã«é ãããå¯èœæ§ããããŸãããã®æ©èœã¯ããã©ã«ãã§ç¡å¹ã«ãªã£ãŠããŸããfeature.possible_transitions=BOOL
CRFsuiteãèšç·ŽããŒã¿å ã«ãããååšããªãé·ç§»ç¹åŸŽïŒããªãã¡ãè² ã®é·ç§»ç¹åŸŽïŒãçæãããã©ãããæå®ããã BOOLã1ã«èšå®ãããšãCRFsuiteã¯ãã¹ãŠã®å¯èœãªã©ãã«ãã¢ãé¢é£ä»ããé·ç§»æ©èœãçæããŸããèšç·ŽããŒã¿ã®ã©ãã«ã®æ°ãLã§ãããšãããšããã®é¢æ°ã¯ïŒL * LïŒã®é·ç§»ç¹åŸŽãçæããããã®æ©èœã¯ããã©ã«ãã§ç¡å¹ã«ãªã£ãŠããŸãã
CRFsuiteã³ãã³ãã©ã€ã³ã®äŸãããã€ã玹ä»ããŸãã
2åæªæºã®æ©èœã¯ãã¬ãŒãã³ã°ã«äœ¿çšãããŸããã
$ crfsuite learn -m CRF.model -p feature.minfreq = 2 train.txt
è² ã®ç¶æ ãšé·ç§»ã®ãã£ãŒãã£ïŒå¥åãå¯ãªãã£ãŒãã£ã»ããïŒãçæããŸãã
$ crfsuite learn -m CRF.model -p feature.possible_states=1 -p feature.possible_transitions=1 train.txt
å¶éãããèšæ¶Broyden-Fletcher-Goldfarb-ShannoïŒL-BFGSïŒæ³ãçšããŠL1ããã³/ãŸãã¯L2æ£èŠåé ãçšããŠèšç·ŽããŒã¿ã®å°€åºŠã®å¯Ÿæ°ãæ倧åããã L1æ£ååé ã®éãŒãä¿æ°ãæå®ããããšãã¢ã«ãŽãªãºã ã¯ãæ£å - éå®çã¡ã¢ãª - æºãã¥ãŒãã³ïŒOWL-QNïŒæ³ã«åãæ¿ããã å®éã«ã¯ããã®ã¢ã«ãŽãªãºã ã¯ãã¬ãŒãã³ã°ããã»ã¹ã®éå§æã«ãã£ãŒãã£ãŠã§ã€ããéåžžã«ãã£ãããšæ¹åããŸãããæçµçã«æé©ãªãã£ãŒãã£ãŠã§ã€ãã«ãã°ããåæããŸãã
c1=VALUE
L1æ£ååã®ä¿æ°ããŒã以å€ã®å€ãæå®ãããšãCRFsuiteã¯Orthant-Wise Limited-Memory Quasi-NewtonïŒOWL-QNïŒã¡ãœããã«åãæ¿ãããŸããããã©ã«ãå€ã¯ãŒãã§ãïŒL1æ£èŠåãªãïŒãc2=VALUE
L2æ£ååã®ä¿æ°ãããã©ã«ãå€ã¯1ã§ããmax_iterations=NUM
L-BFGSæé©åã®æ倧å埩åæ°ãå埩åæ°ããã®å€ãè¶ ãããšãL-BFGSã«ãŒãã³ã¯çµäºããŸããããã©ã«ãå€ã¯ããã·ã³ã®æŽæ°ã®æ倧å€ïŒINT_MAXïŒã«èšå®ãããŠããŸããnum_memories=NUM
L-BFGSãéããã»è¡åãè¿äŒŒããããã«äœ¿çšããå¶éãããã¡ã¢ãªã®æ°ãããã©ã«ãå€ã¯6ã§ããepsilon=VALUE
ã³ã³ããŒãžã§ã³ã¹ã®æ¡ä»¶ã決å®ããã€ãã·ãã³ãã©ã¡ãŒã¿ãããã©ã«ãå€ã¯1e-5ã§ããstop=NUM
åæ¢åºæºããã¹ãããããã®å埩ã®ç¶ç¶æéãããã©ã«ãå€ã¯10ã§ããdelta=VALUE
åæ¢åºæºã®ãããå€ã L-BFGSå埩ã¯ãæåŸã®$ {stop}å埩ã«å¯Ÿãã察æ°å°€åºŠã®æ¹åããã®éŸå€ä»¥äžã§ãããšãã«åæ¢ãããããã©ã«ãå€ã¯1e-5ã§ããlinesearch=STRING
L-BFGSã¢ã«ãŽãªãºã ã§äœ¿çšãããç·æ¢çŽ¢æ³ãå©çšå¯èœãªã¡ãœããã¯ã "MoreThuente"ïŒMoreãšThuenteã«ãã£ãŠææ¡ãããMoreThuenteã¡ãœããïŒã "Backtracking"ïŒéåžžã®Wolfeæ¡ä»¶ã§ã®ããã¯ãã©ããã³ã°ã¡ãœããïŒã "StrongBacktracking"ïŒåŒ·åãªWolfeæ¡ä»¶ã§ã®ããã¯ãã©ããã³ã°ã¡ãœããïŒã§ããããã©ã«ãã®æ¹æ³ã¯ "MoreThuente"ã§ããmax_linesearch=NUM
ã©ã€ã³æ€çŽ¢ã¢ã«ãŽãªãºã ã®è©Šè¡åæ°ã®æ倧å€ãããã©ã«ãå€ã¯20ã§ãã
L-BFGSãã¬ãŒãã³ã°ã®ã³ãã³ãã©ã€ã³ã®ããã€ãã®äŸã以äžã«ç€ºããŸãã
L2æ£ååïŒc1 = 0ãc2 = 1.0ïŒã§ã¢ãã«ãèšç·Žããã
$ crfsuite learn -m CRF.model -a lbfgs -p c2=1 train.txt
L1æ£ååïŒc1 = 1.0ãc2 = 0ïŒã§ã¢ãã«ãèšç·Žããã
$ crfsuite learn -m CRF.model -a lbfgs -p c1=1 -p c2=0 train.txt
L1ãšL2ã®æ£ååïŒc1 = 1.0ãc2 = 1.0ïŒã§ã¢ãã«ãèšç·Žããã
$ crfsuite learn -m CRF.model -a lbfgs -p c1=1 -p c2=1 train.txt
ããããµã€ãº1ã®ç¢ºçåŸé éäžïŒSGDïŒã䜿çšããŠãL2æ£ååé ãçšããŠèšç·ŽããŒã¿ã®å°€åºŠã®å¯Ÿæ°ãæ倧åããããã®ã¢ã«ãŽãªãºã ã¯éåžžãæé©ãªç¹åŸŽéã¿ã«éåžžã«è¿ éã«è¿ã¥ãããæåŸã«é ãåæã瀺ãã
c2=VALUE
L2æ£ååã®ä¿æ°ãããã©ã«ãå€ã¯1ã§ããmax_iterations=NUM
SGDæé©åã®æ倧å埩åæ°ïŒãšããã¯ïŒãæé©åã«ãŒãã³ã¯ãç¹°ãè¿ãåæ°ããã®å€ãè¶ ãããšçµäºããŸããããã©ã«ãå€ã¯1000ã§ããperiod=NUM
åæ¢åºæºããã¹ãããããã®å埩ã®ç¶ç¶æéãããã©ã«ãå€ã¯10ã§ããdelta=VALUE
åæ¢åºæºã®ãããå€ãæåŸã®$ {period}å埩ã§ã®å¯Ÿæ°å°€åºŠã®æ¹åããã®éŸå€ä»¥äžã§ãããšããæé©åããã»ã¹ã¯åæ¢ãããããã©ã«ãå€ã¯1e-5ã§ããcalibration.eta=VALUE
æ ¡æ£ã«äœ¿çšãããåŠç¿çïŒÎ·ïŒã®åæå€ãããã©ã«ãå€ã¯0.1ã§ããcalibration.rate=VALUE
èŒæ£ã®ããã®åŠç¿çã®å¢æžçãããã©ã«ãå€ã¯2ã§ããcalibration.samples=NUM
èŒæ£ã«äœ¿çšãããã€ã³ã¹ã¿ã³ã¹ã®æ°ãèŒæ£ã«ãŒãã³ã¯ãVALUEãã倧ãããªãã€ã³ã¹ã¿ã³ã¹ãã©ã³ãã ã«éžæãããããã©ã«ãå€ã¯1000ã§ããcalibration.candidates=NUM
åŠç¿çã®åè£è ã®æ°ãèŒæ£ã«ãŒãã³ã¯ã察æ°å°€åºŠãé«ããããšãã§ããåŠç¿çã®åè£NUMãèŠã€ããåŸã«çµäºãããããã©ã«ãå€ã¯10ã§ããcalibration.max_trials=NUM
æ ¡æ£ã®åŠç¿çã®æ倧詊è¡åæ°ãèŒæ£ã«ãŒãã³ã¯ãåŠç¿çã®åè£å€NUMãè©ŠããåŸã«çµäºãããããã©ã«ãå€ã¯20ã§ãã
次ã«ãSGDãã¬ãŒãã³ã°ã®ã³ãã³ãã©ã€ã³ã®äŸã瀺ããŸãã
L2æ£ååïŒc2=1.0ïŒã§ã¢ãã«ãèšç·Žããã
$ crfsuite learn -m CRF.model -a l2sgd -p c2=1 train.txt
çŸåšã®ã¢ãã«ãã©ã¡ãŒã¿ãã¢ã€ãã ã·ãŒã±ã³ã¹ãæ£ããäºæž¬ã§ããªãå Žåããã®ã¢ã«ãŽãªãºã ã¯ããŒã»ãããã³æŽæ°ãã¢ãã«ã«é©çšããŸãããã®ã¢ã«ãŽãªãºã ã¯ããã¬ãŒãã³ã°ããã»ã¹ã®ãã¹ãŠã®æŽæ°ã§ãã£ãŒãã£ãŠã§ã€ãã®å¹³åããšããã¢ã«ãŽãªãºã ã¯ãã¬ãŒãã³ã°ã®ã¹ããŒãã®ç¹ã§æãé«éã§ããã¢ã«ãŽãªãºã ã¯éåžžã«ç°¡åã§ãããé«ãäºæž¬æ§èœã瀺ããŸããå®éã«ã¯ãå埩ã®æ倧åæ°ãæå®ããããšã«ãã£ãŠãã¬ãŒãã³ã°ããã»ã¹ãåæ¢ããå¿ èŠããããŸããå埩ã®æ倧åæ°ã¯ãéçºã»ããã§èª¿æŽããå¿ èŠããããŸãã
max_iterations=NUM
å埩ã®æ倧åæ°ïŒãšããã¯ïŒãæé©åã«ãŒãã³ã¯ãç¹°ãè¿ãåæ°ããã®å€ãè¶ ãããšçµäºããŸããããã©ã«ãå€ã¯100ã§ããepsilon=VALUE
ã³ã³ããŒãžã§ã³ã¹ã®æ¡ä»¶ã決å®ããã€ãã·ãã³ãã©ã¡ãŒã¿ãã¢ãã«ã«ãã£ãŠäºæž¬ãããäžæ£ç¢ºãªã©ãã«ã®æ¯çãVALUEãã倧ãããªãå Žåãæé©åã«ãŒãã³ã¯çµäºãããããã©ã«ãå€ã¯1e-5ã§ãã
ããã§ã¯ãAveraged Perceptronã®ã³ãã³ãã©ã€ã³ã®äŸã瀺ããŸãã
10åã®å埩ã§ã¢ãã«ãèšç·Žããã
$ crfsuite learn -m CRF.model -a ap -p max_iterations = 10 train.txt
ãã¬ãŒãã³ã°ããŒã¿äžã®ã¢ã€ãã ã·ãŒã±ã³ã¹ïŒxãyïŒãäžããããå Žåãã¢ã«ãŽãªãºã ã¯æ倱ãèšç®ãããããã§ãsïŒxãyïŒ ïŒy 'ïŒã¯ãã¿ãã»ã©ãã«ã»ã·ãŒã±ã³ã¹ã®ã¹ã³ã¢ã§ãããsïŒxãyïŒã¯ãã¬ãŒãã³ã°ã»ããŒã¿ã®ã©ãã«ã»ã·ãŒã±ã³ã¹ã®ã¹ã³ã¢ã§ãããdïŒy'ãyïŒã¯ãã¿ãã»ã©ãã«ã»ã·ãŒã±ã³ã¹ïŒããã³åç §ã©ãã«é åïŒyïŒãå«ããã¢ã€ãã ã«è² ã§ãªãæ倱ããªãå Žåãã¢ã«ãŽãªãºã ã¯æ倱ã«åºã¥ããŠã¢ãã«ãæŽæ°ããŸãã
type=NUM
ãã£ãŒãã£ãŠã§ã€ããæŽæ°ããããã®æŠç¥ïŒã¹ã©ãã¯å€æ°ãªãã®PAïŒ0ïŒãPAã¿ã€ãIïŒ1ïŒããŸãã¯PAã¿ã€ãIIïŒ2ïŒãããã©ã«ãå€ã¯1ã§ããc=VALUE
ã¢ã°ã¬ãã·ãæ§ãã©ã¡ãŒã¿ïŒPA-Iããã³PA-IIã«ã®ã¿äœ¿çšãããŸãïŒããã®ãã©ã¡ãŒã¿ã¯ç®çé¢æ°ãžã®ã¹ã©ãã¯é ã®åœ±é¿ãå¶åŸ¡ããŸããããã©ã«ãå€ã¯1ã§ããerror_sensitive=BOOL
ãã®ãã©ã¡ãŒã¿ãçïŒéãŒãïŒã§ããå Žåãæé©åã«ãŒãã³ã¯ç®çé¢æ°ã«ãã¢ãã«ã«ãã£ãŠäºæž¬ãããäžæ£ç¢ºãªã©ãã«ã®æ°ã®å¹³æ¹æ ¹ãå«ããããã©ã«ãå€ã¯1ïŒçââïŒã§ããaveraging=BOOL
ãã®ãã©ã¡ãŒã¿ãçïŒéãŒãïŒã§ããå Žåãæé©åã«ãŒãã³ã¯ããã¬ãŒãã³ã°ããã»ã¹ã«ããããã¹ãŠã®æŽæ°ã«ãããç¹åŸŽéã¿ã®å¹³åãèšç®ããïŒAveraged Perceptronãšåæ§ïŒãããã©ã«ãå€ã¯1ïŒçââïŒã§ããmax_iterations=NUM
å埩ã®æ倧åæ°ïŒãšããã¯ïŒãæé©åã«ãŒãã³ã¯ãç¹°ãè¿ãåæ°ããã®å€ãè¶ ãããšçµäºããŸããããã©ã«ãå€ã¯100ã§ããepsilon=VALUE
ã³ã³ããŒãžã§ã³ã¹ã®æ¡ä»¶ã決å®ããã€ãã·ãã³ãã©ã¡ãŒã¿ãå¹³åæ倱ãVALUEãã倧ãããªãå Žåãæé©åã«ãŒãã³ã¯çµäºãããããã©ã«ãå€ã¯1e-5ã§ãã
ãã¬ãŒãã³ã°ããŒã¿å ã®ã¢ã€ãã ã·ãŒã±ã³ã¹ïŒxãyïŒãäžããããå Žåãã¢ã«ãŽãªãºã ã¯ãã¹ãèšç®ãããsïŒxãy 'ïŒã¯ãã¿ãã©ãã«ã®ã¹ã³ã¢ã§ãã sïŒxãyïŒã¯ããã¬ãŒãã³ã°ããŒã¿ã®ã©ãã«ã·ãŒã±ã³ã¹ã®ã¹ã³ã¢ã§ããã
variance=VALUE
ãã¹ãŠã®ç¹åŸŽéã®åæåæ£ã ãã®ã¢ã«ãŽãªãºã ã¯ãå¹³å0ãšåæ£VALUEãæã€å€å€éã¬ãŠã¹ååžãšããŠç¹åŸŽéã®ãã¯ãã«ãåæåããŸãã ããã©ã«ãå€ã¯1ã§ããgamma=VALUE
æ倱é¢æ°ãšç¹åŸŽéã®å€åãšã®éã®ãã¬ãŒããªãã ããã©ã«ãå€ã¯1ã§ããmax_iterations=NUM
å埩ã®æ倧åæ°ïŒãšããã¯ïŒã æé©åã«ãŒãã³ã¯ãç¹°ãè¿ãåæ°ããã®å€ãè¶ ãããšçµäºããŸãã ããã©ã«ãå€ã¯100ã§ããepsilon=VALUE
ã³ã³ããŒãžã§ã³ã¹ã®æ¡ä»¶ã決å®ããã€ãã·ãã³ãã©ã¡ãŒã¿ã å¹³åæ倱ãVALUEãã倧ãããªãå Žåãæé©åã«ãŒãã³ã¯çµäºããã ããã©ã«ãå€ã¯1e-5ã§ãã
CRFã¢ãã«ã䜿çšããŠããŒã¿ã«ã¿ã°ãä»ããã«ã¯ã次ã®ã³ãã³ããå ¥åããŸãã
$ crfsuite tag [OPTIONS] [DATA]
åŒæ°DATAãçç¥ãããå ŽåããŸã㯠' - 'ã®å ŽåãCRFsuiteã¯STDINããããŒã¿ãèªã¿åããŸããã¿ã°ã³ãã³ãã®äœ¿çšæ³ã衚瀺ããã«ã¯ã-hïŒ--helpïŒãªãã·ã§ã³ãæå®ããŸãã
$ crfsuite tag -h
CRFSuite 0.12 Copyright (c) 2007-2011 Naoaki Okazaki
USAGE: crfsuite tag [OPTIONS] [DATA]
Assign suitable labels to the instances in the data set given by a file (DATA).
If the argument DATA is omitted or '-', this utility reads a data from STDIN.
Evaluate the performance of the model on labeled instances (with -t option).
OPTIONS:
-m, --model=MODEL Read a model from a file (MODEL)
-t, --test Report the performance of the model on the data
-r, --reference Output the reference labels in the input data
-p, --probability Output the probability of the label sequences
-i, --marginal Output the marginal probabilities of items
-q, --quiet Suppress tagging results (useful for test mode)
-h, --help Show the usage of this command and exit
ã¿ã°ä»ãã«ã¯æ¬¡ã®ãªãã·ã§ã³ããããŸãã
-mã--model=MODEL
CRFsuiteãCRFã¢ãã«ãèªã¿èŸŒããã¡ã€ã«åã-tã--test
å ¥åããŒã¿ã«ã©ãã«ãä»ããããŠãããšä»®å®ããŠãCRFã¢ãã«ã®ããã©ãŒãã³ã¹ïŒç²ŸåºŠã粟床ããªã³ãŒã«ãf1尺床ïŒãè©äŸ¡ããŸãããã®æ©èœã¯ããã©ã«ãã§ç¡å¹ã«ãªã£ãŠããŸãã-rã--reference
å ¥åã©ãã«ãã©ãã«ä»ããããŠãããšä»®å®ããŠãäºæž¬ã©ãã«ãšäžŠåã«åç §ã©ãã«ãåºåããŸãããã®æ©èœã¯ããã©ã«ãã§ç¡å¹ã«ãªã£ãŠããŸãã-pã--probability
ã¢ãã«ã«ãã£ãŠäºæž¬ãããã©ãã«é åã®ç¢ºçãåºåããããã®æ©èœãæå¹ã«ãããšãã©ãã«ã·ãŒã±ã³ã¹ã¯ "@probability \ tx.xxxx"ãšããè¡ã§å§ãŸããŸãã "x.xxxx"ã¯ã·ãŒã±ã³ã¹ã®ç¢ºçãè¡šãã "\ t"ã¯TABæåãè¡šããŸãããã®æ©èœã¯ããã©ã«ãã§ç¡å¹ã«ãªã£ãŠããŸãã-iã--marginal
ã©ãã«ã®éç確çãåºåããããã®æ©èœãæå¹ã«ãããšãäºæž¬ãããåã©ãã«ã®åŸãã«ãïŒx.xxxxããç¶ããŸãããx.xxxxãã¯ã©ãã«ã®ç¢ºçãè¡šããŸãããã®æ©èœã¯ããã©ã«ãã§ç¡å¹ã«ãªã£ãŠããŸãã-qã--quiet
ã¿ã°ä»ãã©ãã«ã®åºåãæå¶ããŸãããã®é¢æ°ã¯ã-tãªãã·ã§ã³ã䜿çšããŠCRFã¢ãã«ãè©äŸ¡ããå Žåã«äŸ¿å©ã§ãã-hã--help
ãã®ã³ãã³ãã®äœ¿çšæ³ã衚瀺ããŠçµäºããŸãã
ã¿ã°ä»ãã®ããã®CRFsuiteã³ãã³ãã©ã€ã³ã®ããã€ãã®äŸã以äžã«ç€ºããŸãã
CRFã¢ãã«CRF.modelã䜿çšããŠããŒã¿test.txtã«ã¿ã°ãä»ãã
$ crfsuite tag -m CRF.model test.txt
ã©ããªã³ã°ãããããŒã¿test.txtäžã®CRFã¢ãã«CRF.modelãè©äŸ¡ããã
$ crfsuite tag -m CRF.model -qt test.txt
CRFã¢ãã«ããã¬ãŒã³ããã¹ã圢åŒã§ãã³ãããã«ã¯ã次ã®ã³ãã³ããå ¥åããŸãã
$ crfsuite dump <MODEL>