Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save takahi-i/94b4ac25db18d7caf6ee7da4a0737c43 to your computer and use it in GitHub Desktop.
Save takahi-i/94b4ac25db18d7caf6ee7da4a0737c43 to your computer and use it in GitHub Desktop.
RedPen で機種依存文字を検知するためのValidatorサンプル

機種依存文字を検知するValidator ファイル

var dependentCharacters = RegExp(/[①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳ⅠⅡⅢⅣⅤⅥⅦⅧⅨⅩ㍉㌔㌢㍍㌘㌧㌃㌶㍑㍗㌍㌦㌣㌫㍊㌻㎜㎝㎞㎎㎏㏄㎡㍻〝〟№㏍℡㊤㊥㊦㊧㊨㈱㈲㈹㍾㍽㍼]/g);

function validateSentence(sentence) {
    var content = sentence.getContent();
    while ((match = dependentCharacters.exec(content)) != null) {
      addError("Found machine dependent character: " + "\'"+ match[0] + "\'", sentence);
    }
}

Usage

上の機能を RedPen で利用するには、設定ファイルに JavaScript を追加し、、script-path で指定したディレクトリに保存(dependent-char-validator.jsという名前)する。 以下設定ファイルのサンプル。

<redpen-conf lang="ja">
    <validators>
        <!--Rules on sentence length-->
        <validator name="SentenceLength">
            <property name="max_len" value="100"/>
        </validator>
        <validator name="CommaNumber" />

        <!--Rules on expressions-->
        <validator name="SuccessiveWord" />
        <validator name="JapaneseStyle" />
        <validator name="InvalidExpression" />
        <validator name="DoubleNegative" />
        <validator name="Okurigana"/>
        <validator name="JapaneseNumberExpression"/>
        <validator name="JapaneseAmbiguousNounConjunction" />
        <validator name="LongKanjiChain" />
        <validator name="DoubledConjunctiveParticleGa" />
        <!--<validator name="SuggestExpression" />-->

        <!--Rules on symbols and terminologies-->
        <validator name="InvalidSymbol"/>
        <validator name="KatakanaEndHyphen"/>
        <validator name="KatakanaSpellCheck"/>
        <validator name="SpaceBetweenAlphabeticalWord" />
        <validator name="ParenthesizedSentence">
            <property name="max_count" value="3"/>
            <property name="max_nesting_level" value="1"/>
            <property name="max_length" value="10"/>
        </validator>

        <!--Rules on sections and paragraphs-->
        <validator name="SectionLength">
            <property name="max_num" value="1500"/>
        </validator>
        <validator name="EmptySection" />
        <validator name="GappedSection" />
        <validator name="SectionLevel" />
        <validator name="ParagraphNumber"/>

        <!--Load JavaScript validators-->
        <validator name="JavaScript">
          <property name="script-path" value="./js" />
        </validator>
    </validators>
</redpen-conf>

機種依存文字を含む文を引数にして redpen コマンドを実行すると、以下のようにvalidationエラーが出力される。

bash-3.2$ redpen -c redpen-conf-ja.xml -s "500㌢進んだ"
redpen -c redpen-conf-ja.xml -s "500㌢進んだ"
[2017-04-08 23:34:10.873][INFO ] cc.redpen.Main - Configuration file: /Users/ito/work/dependent-char/redpen-conf-ja.xml
[2017-04-08 23:34:10.879][INFO ] cc.redpen.config.ConfigurationLoader - Loading config from specified config file: "/Users/ito/work/dependent-char/redpen-conf-ja.xml"
[2017-04-08 23:34:10.889][INFO ] cc.redpen.config.ConfigurationLoader - Succeeded to load configuration file
[2017-04-08 23:34:10.889][INFO ] cc.redpen.config.ConfigurationLoader - Language is set to "ja"
[2017-04-08 23:34:10.889][WARN ] cc.redpen.config.ConfigurationLoader - No variant configuration...
[2017-04-08 23:34:10.890][INFO ] cc.redpen.config.ConfigurationLoader - No "symbols" block found in the configuration
[2017-04-08 23:34:10.893][INFO ] cc.redpen.config.SymbolTable - "ja" is specified.
[2017-04-08 23:34:10.893][INFO ] cc.redpen.config.SymbolTable - "zenkaku" variant is specified
[2017-04-08 23:34:11.288][INFO ] cc.redpen.parser.SentenceExtractor - "[。, ?, !]" are added as a end of sentence characters
[2017-04-08 23:34:11.288][INFO ] cc.redpen.parser.SentenceExtractor - "[’, ”]" are added as a right quotation characters
[2017-04-08 23:34:11.579][INFO ] org.reflections.Reflections - Reflections took 43 ms to scan 1 urls, producing 5 keys and 53 values
[2017-04-08 23:34:11.646][WARN ] cc.redpen.validator.ValidatorFactory - cc.redpen.validator.sentence.SpaceBeginningOfSentenceValidator is deprecated
[2017-04-08 23:34:11.652][WARN ] cc.redpen.validator.ValidatorFactory - cc.redpen.validator.section.VoidSectionValidator is deprecated
[2017-04-08 23:34:11.662][INFO ] org.reflections.Reflections - Reflections took 2 ms to scan 1 urls, producing 160 keys and 163 values
[2017-04-08 23:34:11.673][INFO ] cc.redpen.util.DictionaryLoader - Succeeded to load InvalidExpressionValidator default dictionary.
[2017-04-08 23:34:11.680][INFO ] cc.redpen.util.DictionaryLoader - Succeeded to load double negative expression rules.
[2017-04-08 23:34:11.681][INFO ] cc.redpen.util.DictionaryLoader - Succeeded to load double negative words.
[2017-04-08 23:34:11.691][INFO ] cc.redpen.util.DictionaryLoader - Succeeded to load katakana word dictionary.
[2017-04-08 23:34:11.693][INFO ] cc.redpen.validator.JavaScriptValidator - JavaScript validators directory: ./js
1: ValidationError[dependent-char-validator.js], Found machine dependent character: '㌢' at line: 500㌢進んだ
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment