Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save AWegnerGitHub/c88763c1b8cfb03d89c48e97fb05eb4b to your computer and use it in GitHub Desktop.
Save AWegnerGitHub/c88763c1b8cfb03d89c48e97fb05eb4b to your computer and use it in GitHub Desktop.

Community automated spam removal project updates

Hello. You may remember me from previous posts describing SmokeDetector and updates to the system to automatically apply flags to known spam. After the last post in March, there was a robust discussion regarding concerns, features and odds and ends that would make the community solution to spam on the Stack Exchange network even better.

Updates to the system since March 2018:

  • SmokeDetector itself now provides a flag on all posts that are automatically flagged. This helps moderators to see that a post was flagged by the system and not only by community members.
  • A dedicated RSS feed showing all posts that were automatically flagged and deleted, per site, is available. This can be accessed from metasmoke (see the blue RSS box underneth the graphs). This has been set up for moderation teams on several sites, so that it is pushed to a chat room for moderator review.
  • The system has been casting up to 4 flags (SmokeDetector + up to 3 users) on posts that pass 99.9% historical confidence on spam reasons on a post.
  • A user script (SIM - SmokeDetector Info for Moderators) has been written to expose autoflagging activity on a post. More details are available at the top of the previous post.
  • metasmoke review system has been expanded to allow tagging domains found in spam posts. This has helped fight spam off of Stack Exchange as well, with several thousand posts removed across Wordpress, Medium, Weebly, Google Sites and others.
  • System improved to ensure all automatically flagged posts are reviewed with feedback coming from multiple users, versus the minimium of only a single review previously.
  • Improved the metasmoke dashboard and implemented per site dashboards to provide better visibility of the actions taken and results per site.
  • Fixed a race condition that resulted in one post on Stack Overflow receiving 6 flags. On this post, the error was noticed in less than a minute and automatic flagging was stopped. Flagging remained off line for several hours while the issue was investigated and resolved. During that time, SmokeDetector remained online and reported potential spam via chatrooms as usual. The post being removed was spam and remained deleted, despite the error on our part in issuing too many flags.

Change in the near term:

We will increase the automatic flags cast on specific criteria. On those specific cases, we'll cast an additional flag (up to 5 total). These are a set of conditions that have 100.0% historical accuracy in determining whether a post is spam or not. This post is to provide transparency and solicit feedback. Charcoal intendeds to implement these changes on August 3, 2018.

@rschrieken
Copy link

rschrieken commented Jul 24, 2018

Last paragraph: Just say what you're going to do, skip the Proposed

We will increase ... cases, we are going to cast ....

end with

This post is to inform you and to have a record of changes applied to the system.

Maybe set a date when those changes go in effect

@angussidney
Copy link

ensure all automatically flagged posts are reviewed, with feedback coming from multiple users.

This suggests that they weren't reviewed at all in the past (due to the comma). I recommend changing this to specifically mention 'as opposed to a minimum of one review in the past'. Side note: do we want to mention the average number of feedbacks on posts?

This post is to inform you and to have a record of changes applied to the system. We'll be implementing this change on August 3, 2018.

This makes me feel like there is no choice/negotiation in the matter. Although Charcoal as an organisation strongly believes in moving foward with this, I think the meta crowd will get very angry if they don't feel like they have any input. Maybe soften up the wording a bit?

Other thoughts

  • The site dashboard wasn't mentioned in the original meta post outside the comments on some answers, as it was developed around the time of posting. Do we want to highlight this feature amongst our other improvements?
  • Do we need to mention how 6 flags were automatically cast on one post due to a race condition?
  • Do we need to clarify what we meen by 'nuke'? (post is archived, still exists on SE system but can only be accessed via link and has spam warning, can be brought back by mod/CM which are easily contactable) Someone seemed to be in disagreement/confusion about our terminology last time.
  • New abuse reports feature, and the efforts of Tripleee/Mith/others in getting some spam sites taken down?

@magisch
Copy link

magisch commented Jul 25, 2018

I have a feeling this is going to go down like a lead baloon. The last time people were really angry and generally malcontent with the idea that we would just be doing this and were just informing them, despite using much softer language for the process. Now, this:

This post is to inform you and to have a record of changes applied to the system. We'll be implementing this change on August 3, 2018.

I expect this to be met with calls for CM's to stop us or with demands to close the room as a voting ring. People were already really on edge last time when we spent a ton more effort addressing concerns from the word go.

@fzql
Copy link

fzql commented Jul 25, 2018

Some specific feedback:

This post is to inform you and to have a record of changes applied to the system. We'll be implementing this change on August 3, 2018.

As noted above, it would help to reword this. I would suggest that the post aims to "provide transparency and solicit feedback" and Charcoal "intendeds to implement these changes on August 3, 2018". This sentence should be set aside in a single paragraph. A bullet list under "Changes in the near term:" would be better, and "changes" implies more than one bullet points.

A user script has been written...

Consider stating the name of the user script here?

We will increase the automatic flags cast on specific criteria... On those specific cases, we'll cast an additional flag (up to 5 total).

I'm nitpicking here because important information should come up first. Something like "We will cast up to 5 automatic flags on specific criteria (up from 4)." should come before the specifics of the criteria. This also assumes users know why it's 5 (i.e. 6 flags does something special).

100.0%

Rounded, or exactly? This is critical information. If it's exactly 100.0%, say something like "100% (no rounding)"; if not, provide enough significance like 99.95%.

And freehand circles please :)

@Undo1
Copy link

Undo1 commented Jul 25, 2018

@FYPetro 100.0% is significance; indicates "99.95% or higher" (which is what we have)

@magisch - People might make those calls, and it'd be fine (and maybe kinda amusing). CMs won't make us stop. Can't please everyone.

@CalvT
Copy link

CalvT commented Jul 27, 2018

see the blue RSS box underneth the graphs

Small spelling error.

On the post in general: I kinda agree that it sounds too "We're going to do this" instead of "We want to do this". Doesn't sound like that matters much though?

I also like the way the idea was explained to terdon - 3 autoflags (this number because it removes it from the homepage), human confirmation, then 2 more autoflags. To me, putting it this way sounds better and is more understandable/relatable than just "we're increasing the number of autoflags".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment