Skip to content

Instantly share code, notes, and snippets.

@rtanglao
Created June 21, 2012 22:48
Show Gist options
  • Save rtanglao/2969092 to your computer and use it in GitHub Desktop.
Save rtanglao/2969092 to your computer and use it in GitHub Desktop.
Regular expression to find suspected vietnamese spammers
#!/usr/bin/env ruby
# -*- coding: utf-8 -*-
require 'rubygems'
require 'json'
require 'time'
require 'date'
require 'mongo'
require 'pp'
MONGO_HOST = ENV["MONGO_HOST"]
raise(StandardError,"Set Mongo hostname in ENV: 'MONGO_HOST'") if !MONGO_HOST
MONGO_PORT = ENV["MONGO_PORT"]
raise(StandardError,"Set Mongo port in ENV: 'MONGO_PORT'") if !MONGO_PORT
MONGO_USER = ENV["MONGO_USER"]
raise(StandardError,"Set Mongo user in ENV: 'MONGO_USER'") if !MONGO_USER
MONGO_PASSWORD = ENV["MONGO_PASSWORD"]
raise(StandardError,"Set Mongo user in ENV: 'MONGO_PASSWORD'") if !MONGO_PASSWORD
db = Mongo::Connection.new(MONGO_HOST, MONGO_PORT.to_i).db("gs")
auth = db.authenticate(MONGO_USER, MONGO_PASSWORD)
if !auth
raise(StandardError, "Couldn't authenticate, exiting")
exit
end
topicsColl = db.collection("topics")
t = topicsColl.find({"subject" => /[ảựăậ]/u}).count()
print t
# pp t["subject"]
in ruby:
t = topicsColl.find({"subject" => /[\xF3]/}).count()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment