Skip to content

Instantly share code, notes, and snippets.

@meqif
Created June 7, 2011 20:17
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save meqif/1013057 to your computer and use it in GitHub Desktop.
Save meqif/1013057 to your computer and use it in GitHub Desktop.
Short script for sorting 'du' output correctly. I did it in both Ruby and Python as practice.
#!/usr/bin/env python
"""
du_sort
Copyright (c) 2011 Ricardo Martins
Licensed under the MIT License.
http://www.opensource.org/licenses/mit-license.php
Don't you hate when you want to quickly check what's using the most space in
your laptop's disk or server's RAID but 'du -hsc *' gives you unsorted output?
Using 'du -hsc *| sort --numeric-sort' doesn't help because the units after the
size mess everything up and using 'du -k' to display the size in blocks would
allow easy sorting but defeat the purpose of having human-readable units.
So I did what any respectable programmer would do: create a script that would
sort the results exactly as I want them. That's how this script was born.
I hope it's as useful to you as it has been to me. :)
"""
import sys
def sort_criterion(line):
"""
Return the size in bytes or adimensional units of a 'du' output line.
Defines the sort criterion for 'du' lines in the form 'size filename',
where size may be adimensional.
>>> sort_criterion("4,0K\tLICENSE")
4096.0
>>> sort_criterion("0\tREADME")
0.0
>>> sort_criterion("5072\ttestfile")
5072.0
"""
size = line.split()[0]
# some locales use commas as decimal separators
size = size.replace(",", ".")
units = ["B", "K", "M", "G", "T", "P"]
EXPONENT = dict(zip(units, range(0, len(units))))
if size[-1] in EXPONENT:
return float(size[:-1]) * 1024 ** EXPONENT[size[-1]]
else: # size given in blocks, don't mess with it
return float(size)
def main():
if len(sys.argv) == 1 or sys.argv[1] == '-':
INPUT_FILE = sys.stdin
else:
INPUT_FILE = open(sys.argv[1])
input = INPUT_FILE.readlines()
ordered_data = sorted(input, key=sort_criterion)
for line in ordered_data:
print line.rstrip()
if __name__ == "__main__":
main()
#!/usr/bin/env ruby
INPUT_FILE = STDIN
input = INPUT_FILE.readlines
ordered_data = input.sort_by do |line|
size = line.split.first
# some locales use commas as decimal separators
size.sub!(",", ".")
units = %w{B K M G T P}
exponents = Hash[units.zip(0..units.length)]
if exponents.has_key? size[-1]
size = size.to_f * 1024 ** exponents[size[-1]]
else # size given in blocks, don't mess with it
size = size.to_f
end
end
puts ordered_data
@meqif
Copy link
Author

meqif commented Jun 7, 2011

I could have done both without the intermediate variables INPUT_FILE, input and ordered_data, but that would result in slightly less readability.

Ruby:

puts STDIN.readlines.sort_by { "…" }

Python:

for line in sorted(sys.stdin.readlines(), key=sort_criterion):
    print line.rstrip()

@meqif
Copy link
Author

meqif commented Jun 8, 2011

I moved this into a proper repository: https://github.com/meqif/du_sort

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment