Skip to content

Instantly share code, notes, and snippets.

@Niklas9
Created September 11, 2018 16:29
Show Gist options
  • Save Niklas9/78f78fbb81aa1a5a3cbcea167db89411 to your computer and use it in GitHub Desktop.
Save Niklas9/78f78fbb81aa1a5a3cbcea167db89411 to your computer and use it in GitHub Desktop.
Find most frequent numbers from a Twilio CSV export
#!/usr/bin/python3
import csv
import operator
import sys
if len(sys.argv) < 2:
print('need csv file as first arg')
sys.exit(1)
numbers = {} # numbers as keys, occurrences as values
no_of_sms = 0
# TODO(niklas9): * create proper class etc for this
with open(sys.argv[1], 'r', encoding='utf-8') as f:
reader = csv.reader(f)
for row in reader:
# TODO(niklas9):
# * should filter out header row if it exists
# * to increase perf for larger csv files, could split in
# n threads here where each thread works on part of the file, then
# merge the results
if not len(row) > 1: continue # some lines might be blank
number = row[1]
if number in numbers:
numbers[number] += 1
else:
numbers[number] = 1
no_of_sms += 1
# TODO(niklas9):
# * actually just looking for the top ~5 numbers here, instead of sorting this
# could be done in O(n*size) runtime complexity instead, where size=5 in this
# example.. only beneficial though if size < log(n), using the same numbers if
# n>149 (as e^5=148.31..)
numbers_sorted = sorted(numbers.items(), key=operator.itemgetter(1),
reverse=True)
print('unique numbers = {:d}'.format(len(numbers_sorted)))
print('sms sent = {:d}\n'.format(no_of_sms))
for item in numbers_sorted[:5]: # print top 5
print(item)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment