Skip to content

Instantly share code, notes, and snippets.

@cpragadeesh
cpragadeesh / model_test.lua
Last active August 26, 2020 05:46
Model test + PCA
local lua_util = require "lua_util"
local lua_settings = require "lua_settings"
local rspamd_kann = require "rspamd_kann"
local ucl = require "ucl"
local argparse = require "argparse"
local rspamd_logger = require "rspamd_logger"
local rspamd_task = require "rspamd_task"
local tensor = require "rspamd_tensor"
local SPAM_LABEL = -1
/**
* Definition for a binary tree node.
* struct TreeNode {
* int val;
* TreeNode *left;
* TreeNode *right;
* TreeNode(int x) : val(x), left(NULL), right(NULL) {}
* };
*/
class Solution {
@cpragadeesh
cpragadeesh / rescore_L2
Created September 15, 2017 23:49
rescore results with L2
Pre-rescore test stats
Statistics at threshold: 15
F-score: 0.09
False positive rate: 0.00 %
False negative rate: 95.51 %
Overall accuracy: 43.33 %
test23123
@cpragadeesh
cpragadeesh / GSoC_2017_work.md
Last active August 28, 2017 14:37
Google Summer of Code 2017 Rspamd symbol re-scoring project.

Corpus testing and Automatic Symbol score generation

Link to repository

Introduction

Emails are scanned by rspamd to produce a list of symbols associated with them (such as MISSING_SUBJECT, SPF_FAIL). Each symbol has a score associated with it. An email's score is the sum total of the scores of the symbols associated with it. This total sum scores determines the action taken on an email. Symbol scores were set manually by us so far. This project aims to generate an optimal set of symbols scores to improve email classification accuracy using Neural Networks.

Project

SYMBOL OLD_SCORE NEW_SCORE
R_MIXED_CHARSET 5 3.69
FORGED_MUA_THEBAT_BOUN 2 0.43
FORGED_MUA_THEBAT_MSGID_UNKNOWN 3 1.52
MID_BARE_IP 2 0.64
FROM_EXCESS_BASE64 1.5 1.71
SUBJ_ALL_CAPS 3 1.56
FAKE_REPLY_C 6 2.69
TO_DOM_EQ_FROM_DOM 0 3.69
R_BAD_CTE_7BIT 4 5.56
SYMBOL OLD_SCORE NEW_SCORE
R_MIXED_CHARSET 5 5.43
FORGED_MUA_THEBAT_BOUN 2 0.44
FORGED_MUA_THEBAT_MSGID_UNKNOWN 3 17.22
MID_BARE_IP 2 -5.16
FROM_EXCESS_BASE64 1.5 3.21
SUBJ_ALL_CAPS 3 20.95
FAKE_REPLY_C 6 2.69
TO_DOM_EQ_FROM_DOM 0 9.70
R_BAD_CTE_7BIT 4 22.36
# current error = 0.5285708144281
# current error = 0.52857081431571
.
.
.
# StochasticGradient: you have reached the maximum number of iterations
# training error = 0.52857081431571
SYMBOL OLD_SCORE NEW_SCORE
R_MIXED_CHARSET 5 4.65
SYMBOL OLD SCORE NEW SCORE
FAKE_REPLY 1.0 0.69
URI_COUNT_ODD 1.0 1.77
INVALID_FROM_8BIT 6.0 6.82
SUSPICIOUS_RECIPS 1.5 2.06
CT_EXTRA_SEMI 1.0 1.65
BROKEN_HEADERS 10.0 11.6
HAS_X_PRIO_FIVE 0.0 0.22
TO_EXCESS_BASE64 1.5 1.26
MID_RHS_IP_LITERAL 0.5 1.14
#include "LinkedList.cpp"
#include <utility>
#include <typeinfo>
using namespace std;
template<class key_type, class value_type>
class HashTable {
const int size;