Skip to content

Instantly share code, notes, and snippets.

@croessner
Created November 22, 2016 13:58
Show Gist options
  • Save croessner/216c0d1cf46dda8521d0b65dbea7ee14 to your computer and use it in GitHub Desktop.
Save croessner/216c0d1cf46dda8521d0b65dbea7ee14 to your computer and use it in GitHub Desktop.
Redis statistics
----------------
From version 1.1, it is also possible to specify Redis as a backend for
statistics and cache of learned messages. Redis is recommended for clustered
configurations as it allows simultaneous learn and checks and, besides, is
very fast. To setup Redis, you could use redis backend for a classifier
(cache is set to the same servers accordingly).
The following configuration is a full featured example of how you can set up
redis for the statistics. Please edit /etc/rspamd/local.d/statistic.conf and
paste the code.
For a redis classifier, you need to set the backend to "redis". It is
important to define the "servers" parameter, as it is not taken from a global
configuration (You might have defined redis for LUA modules). If you want to
have bayes auto learning, you need to tell it to the configuration file (See
below for further explanations on this parameter).
Bayes tokens can be stored per user. For this, you can define a LUA function.
The statfile parameters are used for the key names in redis. You must also
specify, which symbol is spam and which is for ham.
At the end of this configuration file, you find a learning condition LUA
function. It keeps track of already learned tokens.
classifier "bayes" {
tokenizer {
name = "osb";
}
backend = "redis";
servers = "127.0.0.1:6379";
min_tokens = 11;
min_learns = 200;
autolearn = true;
per_user = <<EOD
return function(task)
local rcpt = task:get_recipients(1)
if rcpt then
one_rcpt = rcpt[1]
if one_rcpt['domain'] then
return one_rcpt['domain']
end
end
return nil
end
EOD
statfile {
symbol = "BAYES_HAM";
spam = false;
}
statfile {
symbol = "BAYES_SPAM";
spam = true;
}
learn_condition =<<EOD
return function(task, is_spam, is_unlearn)
local prob = task:get_mempool():get_variable('bayes_prob', 'double')
if prob then
local in_class = false
local cl
if is_spam then
cl = 'spam'
in_class = prob >= 0.95
else
cl = 'ham'
in_class = prob <= 0.05
end
if in_class then
return false,string.format('already in class %s; probability %.2f%%',
cl, math.abs((prob - 0.5) * 200.0))
end
end
return true
end
EOD
}
per_languages is not supported by Redis - it just stores everything in the same place. write_servers are used in the master-slave rotation by default and used for learning, whilst servers are selected randomly each time:
Supported parameters for the redis backend are:
- tokenizer
+ leave it as shown for now. Currently only osb is supported
- backend
+ set it to redis
- servers
+ IP or hostname with port for the redis server. Use an IP for the
loopback interface, if you have defined localhost in /etc/hosts for both
IPv4 and IPv6, or your redis server will not be found!
- write_servers (optional)
+ If needed, define other servers for learning
- password (optional)
+ Password for the redis server
- min_tokens
+ Minimum number of words required for statistics processing
- max_tokens
+ Minimum learn count for both spam and ham classes to perform
classification
- autolearn (optional)
+ See below for details
- per_user (optional)
+ LUA script. See above
- statfile
+ Define keys for spam and ham mails. You must also set the spam=
parameter.
- learn_condition (optional)
+ LUA function as described above.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment