var dependentCharacters = RegExp(/[①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳ⅠⅡⅢⅣⅤⅥⅦⅧⅨⅩ㍉㌔㌢㍍㌘㌧㌃㌶㍑㍗㌍㌦㌣㌫㍊㌻㎜㎝㎞㎎㎏㏄㎡㍻〝〟№㏍℡㊤㊥㊦㊧㊨㈱㈲㈹㍾㍽㍼]/g);
function validateSentence(sentence) {
var content = sentence.getContent();
while ((match = dependentCharacters.exec(content)) != null) {
addError("Found machine dependent character: " + "\'"+ match[0] + "\'", sentence);
}
}
上の機能を RedPen で利用するには、設定ファイルに JavaScript を追加し、、script-path で指定したディレクトリに保存(dependent-char-validator.js
という名前)する。
以下設定ファイルのサンプル。
<redpen-conf lang="ja">
<validators>
<!--Rules on sentence length-->
<validator name="SentenceLength">
<property name="max_len" value="100"/>
</validator>
<validator name="CommaNumber" />
<!--Rules on expressions-->
<validator name="SuccessiveWord" />
<validator name="JapaneseStyle" />
<validator name="InvalidExpression" />
<validator name="DoubleNegative" />
<validator name="Okurigana"/>
<validator name="JapaneseNumberExpression"/>
<validator name="JapaneseAmbiguousNounConjunction" />
<validator name="LongKanjiChain" />
<validator name="DoubledConjunctiveParticleGa" />
<!--<validator name="SuggestExpression" />-->
<!--Rules on symbols and terminologies-->
<validator name="InvalidSymbol"/>
<validator name="KatakanaEndHyphen"/>
<validator name="KatakanaSpellCheck"/>
<validator name="SpaceBetweenAlphabeticalWord" />
<validator name="ParenthesizedSentence">
<property name="max_count" value="3"/>
<property name="max_nesting_level" value="1"/>
<property name="max_length" value="10"/>
</validator>
<!--Rules on sections and paragraphs-->
<validator name="SectionLength">
<property name="max_num" value="1500"/>
</validator>
<validator name="EmptySection" />
<validator name="GappedSection" />
<validator name="SectionLevel" />
<validator name="ParagraphNumber"/>
<!--Load JavaScript validators-->
<validator name="JavaScript">
<property name="script-path" value="./js" />
</validator>
</validators>
</redpen-conf>
機種依存文字を含む文を引数にして redpen コマンドを実行すると、以下のようにvalidationエラーが出力される。
bash-3.2$ redpen -c redpen-conf-ja.xml -s "500㌢進んだ"
redpen -c redpen-conf-ja.xml -s "500㌢進んだ"
[2017-04-08 23:34:10.873][INFO ] cc.redpen.Main - Configuration file: /Users/ito/work/dependent-char/redpen-conf-ja.xml
[2017-04-08 23:34:10.879][INFO ] cc.redpen.config.ConfigurationLoader - Loading config from specified config file: "/Users/ito/work/dependent-char/redpen-conf-ja.xml"
[2017-04-08 23:34:10.889][INFO ] cc.redpen.config.ConfigurationLoader - Succeeded to load configuration file
[2017-04-08 23:34:10.889][INFO ] cc.redpen.config.ConfigurationLoader - Language is set to "ja"
[2017-04-08 23:34:10.889][WARN ] cc.redpen.config.ConfigurationLoader - No variant configuration...
[2017-04-08 23:34:10.890][INFO ] cc.redpen.config.ConfigurationLoader - No "symbols" block found in the configuration
[2017-04-08 23:34:10.893][INFO ] cc.redpen.config.SymbolTable - "ja" is specified.
[2017-04-08 23:34:10.893][INFO ] cc.redpen.config.SymbolTable - "zenkaku" variant is specified
[2017-04-08 23:34:11.288][INFO ] cc.redpen.parser.SentenceExtractor - "[。, ?, !]" are added as a end of sentence characters
[2017-04-08 23:34:11.288][INFO ] cc.redpen.parser.SentenceExtractor - "[’, ”]" are added as a right quotation characters
[2017-04-08 23:34:11.579][INFO ] org.reflections.Reflections - Reflections took 43 ms to scan 1 urls, producing 5 keys and 53 values
[2017-04-08 23:34:11.646][WARN ] cc.redpen.validator.ValidatorFactory - cc.redpen.validator.sentence.SpaceBeginningOfSentenceValidator is deprecated
[2017-04-08 23:34:11.652][WARN ] cc.redpen.validator.ValidatorFactory - cc.redpen.validator.section.VoidSectionValidator is deprecated
[2017-04-08 23:34:11.662][INFO ] org.reflections.Reflections - Reflections took 2 ms to scan 1 urls, producing 160 keys and 163 values
[2017-04-08 23:34:11.673][INFO ] cc.redpen.util.DictionaryLoader - Succeeded to load InvalidExpressionValidator default dictionary.
[2017-04-08 23:34:11.680][INFO ] cc.redpen.util.DictionaryLoader - Succeeded to load double negative expression rules.
[2017-04-08 23:34:11.681][INFO ] cc.redpen.util.DictionaryLoader - Succeeded to load double negative words.
[2017-04-08 23:34:11.691][INFO ] cc.redpen.util.DictionaryLoader - Succeeded to load katakana word dictionary.
[2017-04-08 23:34:11.693][INFO ] cc.redpen.validator.JavaScriptValidator - JavaScript validators directory: ./js
1: ValidationError[dependent-char-validator.js], Found machine dependent character: '㌢' at line: 500㌢進んだ