Skip to content

Instantly share code, notes, and snippets.

Last active September 17, 2018 06:20
Show Gist options
  • Save lorenzhs/864353c202112a38de17ed054f31e67c to your computer and use it in GitHub Desktop.
Save lorenzhs/864353c202112a38de17ed054f31e67c to your computer and use it in GitHub Desktop.
Twitter cryptoscam detection proof of concept.
#!/usr/bin/env python3
# encoding: utf-8
# author: Lorenz Hübschle-Schneider
# This is really really simple. Twitter, you have no excuse for not doing something like this!
import codecs
import json
import re
from unicodedata import normalize
eth_regex = re.compile("0x[a-fA-F0-9]{40}")
btc_regex = re.compile("[13][a-km-zA-HJ-NP-Z1-9]{25,34}")
# from:
def levenshteinDistance(s1, s2):
if len(s1) > len(s2):
s1, s2 = s2, s1
distances = range(len(s1) + 1)
for i2, c2 in enumerate(s2):
distances_ = [i2+1]
for i1, c1 in enumerate(s1):
if c1 == c2:
distances_.append(1 + min((distances[i1], distances[i1 + 1], distances_[-1])))
distances = distances_
return distances[-1]
# return edit distance between ascii-normalized versions of the two input strings
def normalized_distance(original, query):
normalized_original = normalize('NFKD', original).encode('ascii', 'ignore')
normalized_query = normalize('NFKD', query).encode('ascii', 'ignore')
return levenshteinDistance(normalized_original, normalized_query)
# compute a score how likely a tweet is to be cryptocurrency scam
# value ranges from 0.0 (probably not) to 1.0 (pretty certain)
def classify_scam(tweet, original_tweet):
score = 0.0
text = tweet["full_text"]
if or
score += 0.5
displayname_distance = normalized_distance(original_tweet["user"]["name"],
username_distance = normalized_distance(original_tweet["user"]["screen_name"],
score += 1.0/(displayname_distance + username_distance + 1)
return score
if __name__ == '__main__':
import sys
if len(sys.argv) == 1:
print('Usage: {} twarc-replies-dump.json'.format(sys.argv[0]))
print('This tool parses the output of "twarc replies <tweet-id>"')
print('See for more information on twarc')
filename = sys.argv[1]
with, 'r', 'utf8') as inputfile:
lines = inputfile.readlines()
print('Read {} lines'.format(len(lines)))
original_tweet = json.loads(lines[0])
suspects = []
for line in lines[1:]:
tweet = json.loads(line)
score = classify_scam(tweet, original_tweet)
if score > 0.2:
suspects.append((score, tweet))
for (score, tweet) in sorted(suspects, reverse = True):
print('Found a likely scammy tweet, score {}:'.format(score))
print('\tfrom: {user} – {name}'.format(
user = tweet["user"]["screen_name"].encode('utf-8'),
name = tweet["user"]["name"].encode('utf-8')))
print('\ttext: {}'.format(tweet["full_text"].encode('utf-8')))
Copy link

lorenzhs commented Feb 8, 2018

Here's the output when applied to the first 753 replies to

Read 754 lines
Found a likely scammy tweet, score 1.0:
	from: elonmuski – Elon Musk

	text: @elonmusk Hi guys! I'm donating 250 Ethereum to the ETH community! First 250 transactions with 0.2 ETH sent to the address below will receive 1.0 ETH in the address the 0.2 ETH came from.


The promotion will last 24 hours! Hurry!
Found a likely scammy tweet, score 1.0:
	from: elonmuski – Elon Musk

	text: @elonmusk Hi guys! I'm donating 250 BITCOIN! to the BTC community! First 250 transactions with 0.2  BTC sent to the address below will receive 1.0 BTC in the address the 0.2 BTC came from.


The promotion will last 24 hours! Hurry!
Found a likely scammy tweet, score 0.833333333333:
	from: eIonmus_ – Elon Musk

	text: @elonmusk Hi guys! I'm donating 300 Ethereum to the ETH community! First 300 transactions with 0.25 ETH sent to the address below will receive 1.0 ETH in the address the 0.25 ETH came from.


The promotion will last 48 hours! Hurry!
Found a likely scammy tweet, score 0.833333333333:
	from: alon_musk – Elon Musk

	text: @elonmusk Hi guys! I'm donating 250 Ethereum to the ETH community! First 250 transactions with 0.2 ETH sent to the address below will receive 1.0 ETH in the address the 0.2 ETH came from.


The promotion will last 24 hours! Hurry!
Found a likely scammy tweet, score 0.75:
	from: elomnosk – Elon Musk

	text: @elonmusk By the way: I'm giving away 125 BTC to my followers. Just send 0.025 BTC to the address below and I'll send you 0.5 BTC back, through the same address you used in the transaction.


This is my way of thanking all my fans and friends. Thank you!
Found a likely scammy tweet, score 0.7:
	from: eeIIon_musk – Elon Musk

	text: @elonmusk Hi guys! I'm donating 250 Ethereum to the ETH community! First 250 transactions with 0.2 ETH sent to the address below will receive 1.0 ETH in the address the 0.2 ETH came from.


The promotion will last 24 hours! Hurry!
Found a likely scammy tweet, score 0.7:
	from: ElloonMusk – Elon Musk

	text: @elonmusk I'm happy and giving to my followers 100 Ethereum, send 0.2 Eth to the address below and you will receive 2.0 Ethereum.


Act fast! you don't want to miss out!!
Found a likely scammy tweet, score 0.666666666667:
	from: ElonMuskkkk – Elon Musk

	text: @elonmusk Guys please beware of scammers on my page! The only address to send your Eth is 


For every 0.2 Eth sent I'll send 2 Eth back! But hurry, this offer is limited!
Found a likely scammy tweet, score 0.538461538462:
	from: VitalikButtiren – Vitalik Buterin

	text: @elonmusk Hi guys! I'm donating 500 ETH to the ETH community.  First 2500 transaction. Just send 0.2 ETH to the address below and you will receive 2.0 ETH.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment