Skip to content

Instantly share code, notes, and snippets.

@jsha
Last active January 31, 2016 23:13
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jsha/e6bd3860efd54224644b to your computer and use it in GitHub Desktop.
Save jsha/e6bd3860efd54224644b to your computer and use it in GitHub Desktop.
Python script to count occurrences of a field value in MySQL
#!/usr/bin/env python
#
# Using MySQL's grouping and counting features for very large tables is
# very slow. This Python script makes it much faster to count unique
# values of a given column. It selects the column values in reasonably-sized
# chunks (10k rows), and feeds those values into a collections.Counter.
#
# Assumptions: you have a .my.cnf set up to connect to the right DB with the
# right username and password. Your table has an id field, which is numeric and
# ideally autoincrement (for smooth distribution of values).
import subprocess
import collections
import threading
process = subprocess.Popen("mysql -B --skip-column-names", shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
field = "source_uid"
table = "Actions"
inc = 10000
bottom = 0
top = int(1e9)
def query():
for i in range(bottom, top, inc):
process.stdin.write("""select %s from %s where %d <= id and id < %d;\n""" % (
field, table, i, i + inc))
process.stdin.close()
t = threading.Thread(target=query)
t.start()
c = collections.Counter(process.stdout)
for (value, count) in c.most_common(50):
print value.rstrip('\n'), count
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment