Beforehand, install these softwares. 2,3,4 are required to use Tokenizer::MeCab plugin. 5,6,7 are not necessary but improves detection accuracy in Japanese and Chinese.
- Character set encoding detector: Encode::Detect
- Distributed at: https://metacpan.org/pod/Encode::Detect
- Morphological analysis engine: MeCab
- Requirements: 0.996 or later
- Distributed at: https://taku910.github.io/mecab/
- Note:
- Make sure you add
--with-charset=utf8
to./configure
- Make sure you add
- Dictionary for MeCab: mecab-ipadic
- Requirements: 2.7.0-20070801 later
- Distributed at: https://taku910.github.io/mecab/
- Note:
- Make sure you add
--with-charset=utf8
to./configure
- Make sure you add
- Perl binding of MeCab: mecab-perl
- Requirements: 0.996 or later
- Distributed at: https://taku910.github.io/mecab/
- Japanese JIX X 0213 charset encoding module: Encode::JIS2K
- Distributed at:: https://metacpan.org/pod/Encode::JIS2K
- Chinese extended encoding module: Encode::HanExtra
- Distributed at: https://metacpan.org/pod/Encode::HanExtra
- Microsoft Windows compatible Japanese charset encoding module: Encode::EUCJPMS
- Distributed at: https://metacpan.org/pod/Encode::EUCJPMS
You can download patches here: https://github.com/heartbeatsjp/spamassassin_ja/patches/
These files are required:
- spamassassin-3.4.x-japanese-tokenizer.patch (a patch file for Japanese Tokenizer)
- tokenizer.pre (a configuration file for Japanese Tokenizer)
Extract SpamAssassin tarball and patch.
cd Mail-SpamAssassin-3.4.x
patch -p1 < spamassassin-3.4.x-japanese-tokenizer.patch
All that left is standard SpamAssassin installation procedure.
-
Add this line to
local.cf
normalize_charset 1
-
Put
tokenizer.pre
to the same directory aslocal.cf
-
(If you use MeCab plugin)Edit
tokenizer.pre
, comment out SimpleJA plugin line, uncomment MeCab plugin line.# Tokenizer::SimpleJA # #loadplugin Mail::SpamAssassin::Plugin::Tokenizer::SimpleJA # Tokenizer::MeCab # loadplugin Mail::SpamAssassin::Plugin::Tokenizer::MeCab
-
Run
spamassassin --lint
to confirm no warningsspamassassin --lint
-
Everything is done. Restart the daemon if you're using SpamAssassin as a daemon.