Skip to content

Instantly share code, notes, and snippets.

@dlitz
Created March 14, 2010 05:15
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dlitz/331785 to your computer and use it in GitHub Desktop.
Save dlitz/331785 to your computer and use it in GitHub Desktop.
#!/usr/bin/env python
# sort | uniq -c | sort -rn in Python
# Released into the public domain. No rights reserved.
import sys
# Build a dictionary of lines and their associated counts.
counts = {}
#input_file = open("/path/to/file", "r")
input_file = sys.stdin
try:
for line in input_file:
line = line.rstrip("\n").rstrip("\r") # Strip trailing LF/CRLF
counts[line] = counts.get(line, 0) + 1
finally:
if input_file is not sys.stdin:
input_file.close()
# Build a list of [(lineA, countA), (lineB, countB), ... ]
sorted_counts = list(counts.items())
# Sort the list by (count, line) in reverse order.
sorted_counts.sort(lambda a,b: -cmp((a[1], a[0]), (b[1], b[0])))
# Output the lines
for line, count in sorted_counts:
print "%7d %s" % (count, line)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment