Skip to content

Instantly share code, notes, and snippets.

@ArtOfCode-
Last active March 5, 2018 04:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save ArtOfCode-/355adf8c2afc595ea2726587dfeb4020 to your computer and use it in GitHub Desktop.
Save ArtOfCode-/355adf8c2afc595ea2726587dfeb4020 to your computer and use it in GitHub Desktop.
Autoflagging meta post

Things Left To Do

  • RFC from mods

Post follows.


T: We'd like to cast more flags on spam automatically. What do you think?

TL;DR: Charcoal is the organisation behind SmokeDetector. Since January 2017, we've been casting up to 3 flags automatically on posts that our systems are confident are spam. We'd like to increase that to 5 automatic flags to reduce the time spam spends alive on the network.

Who are you?

Charcoal is a user-run organisation that is primarily responsible for the spam-detecting bot, SmokeDetector. Over the past four years, with the aid of SmokeDetector, we've looked for spam on the Stack Exchange network to flag and destroy manually. In January 2017, with the blessing of Stack Exchange, we started running an "autoflagging" project, wherein our systems automatically cast up to three flags on a post if they're confident that it's spam. If you missed that happening entirely, we wrote a meta post on Meta Stack Exchange - or there's a slightly more concise explanation on our website.

How's that been going for you?

Good. We currently have 215 users who have opted into the autoflagging system (you can sign up too, if you're interested). We've flagged around 30 000 (29 592) posts, of which the vast majority (29 526) were confirmed spam - that's 99.7% accurate.

What are you proposing?

We'd like to expand our autoflagging system. At present we cast up to 3 flags on posts we're confident are spam; we'd like to increase that number to 4 or 5 flags.

Why?

Just so we're up-front about this: this is an experiment. Ultimately, we're trying to do these things:

  • Reduce the time that spam spends on the sites before being deleted;
  • Lower the number of humans who involuntarily have to see or interact with spam.

Increasing the number of flags we cast automatically on spam should accomplish both of these things:

  • Automatic flags are near-instant; manual flags take multiple minutes to be cast - that means that increasing the ratio of automatic to manual flags results in a shorter time before 6 flags accumulate and the spam is deleted.
  • Automatic flags are not cast by a human. Fewer humans, therefore, are forced to see/interact with the spam.

The data we have backs this up. In terms of time to deletion, we saw a significant drop in the time it took to delete spam when we started our autoflagging project. Take a look at this graph from the meta post on the subject for an excellent visual representation of that. Before we started autoflagging, spam spent an average of 400 hours per day alive across the network; with autoflagging in place, the average is 10x less, at around 40 hours per day.

What would this change mean for sites?

If this change goes ahead, these things are likely to happen:

  • It will only take 1 or 2 manual flags from users to spam-nuke an autoflagged post, instead of the current 3. Posts that are not autoflagged will, of course, still require 6 flags to nuke.
  • There may be an increase in posts spam-nuked entirely by Charcoal members, who may or may not be active on the site.
  • You will see a reduction in the time spam spends on the site before being deleted.
  • Fewer humans will have to involuntarily see each spam post.

The last two of those are indisputably good things. The first two, however, are more controversial, and are the reason we want to have a discussion here on meta before we make this happen. What follows are the major concerns we've seen, and what we can do about them or why we don't think they're an issue - we'd like to hear your thoughts.

The major thing we're looking for out of this is a reduction in time to deletion. The following graph shows how long spam currently spends alive on the top few sites; we're hoping to see a moderate reduction in the average times, and a significant reduction in the top outliers.

The following graph is from an experiment we've been running over the past week, casting between 1 and 5 flags randomly on each post matching the settings we're considering.

In raw numbers, that's this:

PostCount  FlagCount  ATTD      StdDev  CommonMax
55         1          190.5091  197.59  585.69
55         2           85.0182  109.86  304.74
62         3           48.9355   83.57  216.07
68         4           26.9559   51.98  130.92
56         5           10.2143    5.60   21.41

PostCount is the sample size; FlagCount the number of flags cast on each post in the sample; ATTD the average time to deletion, and CommonMax is the maximum of a 97% confidence interval. The major takeaway from these stats is that we're likely to see a ~5x drop in the average time to deletion, and a ~10x drop in the outliers.

Accuracy & false positives

Spam flags are a powerful feature that need some care in applying correctly. This is a concern that came up when we originally built the autoflagging system, so we already have safeguards built in.

  • We only flag a post if we're more than 99.5% sure it's spam. (Technically, the precise certainty varies by conditions set by the users whose accounts we use, but it's always above 99.5% - more detail on that on our website).
  • If the system breaks down or bugs out and starts flagging things it shouldn't, all Charcoal members and all network moderators have access to a command that immediately halts all flagging activity and requires intervention from a system administrator to re-enable. Outside of testing, that kill-switch has never had to be used.
  • We never unilaterally nuke a post. There are currently 3 manual flags required in addition to the automatic flags to nuke a post; this increase proposal still retains at least one manual flag.

We also make sure that everything has human oversight at all times. While only 3 humans currently have to manually flag the post, there are always more users than that reviewing the system's decisions and classifications; if a post is flagged that shouldn't have been, we are alerted and can alert the relevant moderators to resolve the issue. Again, this is very rare: over the past year, we've flagged 66 posts that shouldn't have been, compared to 29 592 spam posts (that's 99.7% accurate overall). We allow users to set their own flagging conditions, provided they don't go below our baseline 99.50% certainty. We recommend, however, a higher value that has a certainty of 100.00% - those who set their conditions below that are likely to see more false positives flagged using their account.

This proposal decreases the required manual involvement to nuke a post; to compensate for that lower human-involvement barrier, we will correspondingly increase the required accuracy before casting the extra automatic flags. For example, we currently require 99.5% accuracy before casting autoflags; we could require 99.9% accuracy for 4 autoflags, and 99.99% accuracy for 5 autoflags. (For reference, humans are accurate 95.4% of the time, or 87.3% on Stack Overflow - those are stats that jmac (a former Community Manager) looked up for us last year when we started autoflagging).

In the rare event of a legitimate post getting autoflagged, we also have systems in place to ensure it isn't accidentally deleted and forgotten about. Multiple people review each post we catch, whether it's autoflagged or not, and classify it as spam or not; if an autoflagged post is classified as not-spam, the system posts an alert to chat to let us know. That lets us ping the necessary people to retract their flags, and keep an eye on the post to make sure it doesn't get deleted.

To make it starkly clear how accurate this could be, here's a visualisation:

300x100 grid of green squares, with one red square near the top

That's a chronological representation (left-right, top-bottom) of every post that would have been flagged under the settings we're considering for 5 flags, and whether they were spam (green squares) or legitimate (red squares).

Community agency & involvement

As I said earlier, this proposal reduces the required manual involvement to nuke a post. Since Charcoal members also cast manual flags on top of the automatic flags cast by the system, that's also likely to increase the number of posts that are nuked entirely by Charcoal members, without involvement from users who are active on this site. Some posts already have 6 flags cast on them by Charcoal (including autoflagging and manual flags), but the proportion of posts that applies to is likely to increase.

We don't think this is an issue in terms of subject matter expertise: the spam we see on the Stack Exchange network is broadly the same wherever you go - you don't need any subject matter expertise or activity on a particular site to be able to tell what's spam and what's not. We do, however, recognise that it's possible that a site's community may want to handle its own spam; if that's the case, we're happy to turn the autoflagging system off on this site or to retain it at its current levels.

What now?

We want to increase the number of automated flags from 3 to 5 to reduce the time spam spends alive on the network. We'd like to hear your thoughts. We appreciate that quite a lot of the stuff we do at Charcoal is fairly invisible to the sites, so we want to be as open as possible. If you'd like data or specific reports, let us know and we'll try to add them in - we already have a lot of reporting around autoflagging, so it may already exist. If there's other stuff we can do to explain or to help you make an informed decision about whether you want this, drop an answer or a comment on this post. Charcoal members will be hanging around this post to respond to your concerns, or you can also visit us in chat.


1 There are some flags from Charcoal members that don't appear in these statistics, for various reasons, but the majority are there.

@ArtOfCode-
Copy link
Author

@gparyani I left "on behalf" out because it's kinda unnecessary in the TL;DR. If we were doing it on behalf of unwilling users, we wouldn't exactly be shouting about it on meta. It's explained further down. I'm leaving out commas as thousands separators because while the US and India and the UK and a bunch of other places all use it, Europe reverses it. SI standard: 30,000.295. Europe: 30.000,295.

@angussidney
Copy link

angussidney commented Mar 3, 2018

while we haven't decided on concrete numbers yet, it would be possible to require 99.9% accuracy for 4 autoflags

To make it starkly clear how accurate our recommended settings are, here's a visualisation

Those two sentences are conflicting. The first one says that we haven't decided on the numbers for >3 autoflags, whereas the graph is of our proposed setting (>300 weight). One of these will need to be changed.

Multiple people review each post we catch, whether it's autoflagged or not, and classify it as spam or not;

This sentence/paragraph sounds a bit clunky. Maybe it should be:

No matter whether it's been autoflagged or not, multiple humans review every post that we catch and manually classify it as spam or not spam. If an autoflagged post is classified as not-spam, the system posts an alert to chat to let us know, so that we can ping the necessary people to retract their flags and keep an eye on the post to make sure it doesn't get deleted.

@terdon
Copy link

terdon commented Mar 3, 2018

Could you please also add some data about how long spam survives currently? You are proposing a change to improve X (spam survival time) but give no data on X at all. As already discussed (at length) elsewhere, going from a survival rate of say 1 hour to one of 1 minute is a very different thing from going from 10 seconds to 5 seconds and I feel like the single most important point this post needs to make is to explain what improvement the change would offer. And it can't do this unless you discuss how long spam survives on the network at the moment.

Here's one you can use:

foo

The white dot is the mean time to deletion. I used this SQL query to get the data:

SELECT 
	a.created_at, a.deleted_at, TIME_TO_SEC(TIMEDIFF(a.deleted_at, a.created_at)) AS ttd, b.site_name 
FROM 
	metasmoke_dump.posts a
INNER JOIN
	metasmoke_dump.sites b ON a.site_id = b.id
WHERE a.deleted_at IS NOT NULL AND a.is_tp = 1 AND a.is_naa = 0 AND a.created_at IS NOT NULL 
AND a.created_at > '2017-2-1' 
AND
	a.site_id 
        IN (
		SELECT site_id 
	        FROM metasmoke_dump.sites 
		WHERE site_name 
                IN 
                (
		'Stack Overflow', 'Ask Ubuntu', 'Ask Different', 
                 'Graphic Design', 'Drupal Answers', 'Super User', 'The Workplace', 
                 'Meta Stack Exchange', 'Astronomy', 'Information Security'
		)
        )
ORDER BY ttd DESC

Saved the results as topsites.tp.tsv and then, in R:

df<-read.table("smokey/topsites.tp.tsv",header=T,sep="\t")
library(ggplot2)
ggplot(df, aes(x=site_name,y=ttd, fill=site_name)) + geom_boxplot() + 
scale_y_continuous(limits = quantile(df$ttd, c(0.1, 0.9))) + 
stat_summary(fun.y=mean, geom="point", shape=21, size=3,col="black", fill="white" ) + 
scale_fill_brewer(palette="Paired") + ylab("Time to Deletion (sec)") + xlab("")
ggsave("smokey/foo.png")

@Undo1
Copy link

Undo1 commented Mar 3, 2018

Maybe change this:

That's a chronological representation of every post that could have been flagged under the recommended settings, and whether they were spam (green squares) or legitimate (red squares).

to specify that it's looking at 300 weight and clarify the order:

That's a chronological representation (left to right, top to bottom) of every post that could have been flagged under the settings we're considering for 5 flags, and whether they were spam (green squares) or legitimate (red squares).

That's an absolutely insane graph. I love it. Just need to clarify that it's for 300 weight, not our recommended 280 for normal flags.

@AWegnerGitHub
Copy link

In the "what would this change mean" section, the "the last two..." paragraph should be directly after the bullet points and before the "The major thing we're looking for" paragraph and graph.

In the green graph, I am a bit confused on its meaning. In the preceding explaination, it says there are 66 misflagged posts. I see a single red dot. I'd expect 65 more dots scattered in the 29.5K (presumably green dots).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment