Today, I got this email:
I finally found a home biz that actually works, I found it on Twitter. Here is the link: Click here I begun it a couple of weeks ago and doing quite great with it, thought I would share with ALL my contacts. best wishes!
This came encoded as HTML, not to show the URL embedded in the message. It seems pretty harmless but let's look at the url:
Wow, it's actually something from LinkedIn ... or is it? To know for sure, I need to decode www.ow.ly/85v0l. Let's see:
Now it's clear that it's a spam: "Work At Home Mom Makes $7,687 / Month Part-Time". But it's close to impossible to know this from the email body or from the URL in it. Moreover, this example is, probably, too close to not_spam cases to use it for machine learning (spam classification task). This leaves us wishing that LinkedIn or any other website with the same feature will fix it.
TL;DR - two main points:
- if you own a web site don't allow for any redirect, unless you don't care to be blacklisted (check this one out: http://www.linkedin.com/redirect?url=www.ow.ly/8fPgO)
- there is a limit to what you can teach a machine, if a human cannot easily distinguish between spam and not_spam, machine won't either