Skip to content

Instantly share code, notes, and snippets.

@varjmes
Created December 28, 2014 12:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save varjmes/dd2d3f4ba7207242251c to your computer and use it in GitHub Desktop.
Save varjmes/dd2d3f4ba7207242251c to your computer and use it in GitHub Desktop.
humansize.py
SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'],
1024: ['KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']}
def approximate_size(size, a_kilobyte_is_1024_bytes=True):
"""Convert a file size to human-readable form.
Keyword arguments:
size -- file size in bytes
a_kilobyte_is_1024_bytes -- if True (default), use multiples of 1024,
if False, use multiples of 1000
Returns: string
"""
if size < 0:
raise ValueError("Number must be non-negative")
multiple = 1024 if a_kilobyte_is_1024_bytes else 1000
for suffix in SUFFIXES[multiple]:
size /= multiple
if size < multiple:
return "{0:.1f} {1}".format(size, suffix)
"""
Can someone tell me how the above for loop selects the correct suffix? It does it as if by magic.
I understand how everything else works, but don't understand how the right suffix within the list
is selected.
"""
raise ValueError("Number too large")
if __name__ == "__main__":
print(approximate_size(1000000000000, False))
print(approximate_size(1000000000000))
@varjmes
Copy link
Author

varjmes commented Dec 28, 2014

Can someone tell me how the above for loop selects the correct suffix? It does it as if by magic. I understand how everything else works, but don't understand how the right suffix within the list is selected.

@varjmes
Copy link
Author

varjmes commented Dec 28, 2014

The loop...loops, until the size is divided enough in that it is less than the multiple (1024 or 1000). Depending on the number of loops made up until that point, it takes the number of loops (e.g. 4) and picks that number within the suffixes list (eg. the fourth element in the 1024 Suffix list is 'PiB'). It adds it on to the now divided size. Voila.

@owenjones
Copy link

You can think of size /= multiple as size = size / multiple, for every iteration of the loop the size is divided by the multiple and then compared with it - if the size is larger (and so can be divided again) the loop moves onto the next suffix, continuing until it reaches the correct suffix.

@junklight
Copy link

you are basically looking to see "if you are there yet"

so you try each unit in turn

divide size by KB and see if you have less than a 1000 (or 1024) KB left - if so its something.something KB
otherwise you're at least in the next bracket for units
divide size by MB and see if you have less than a 1000 (or 1024) MB left - if so its something.something MB
etc

if it helps (sometimes seeing the same thing done a different way is useful ) my version of the same function is (you'll have to work out the indentation because comments don't seem to like it but it's fairly obvious)

units = [ "bytes" , "Kb" , "Mb" , "Gb" , "Tb" , "Pb" ]

def _nicesize( v , uidx ):
k = float(v)/1024
if k > 1:
return _nicesize( k , uidx + 1 )
else:
return ( v, uidx )

def nicesize( v ):
( v , uidx ) = _nicesize( v ,0 )
return "%#.2f %s" % ( v , units[uidx] )

@junklight
Copy link

interestingly my version and yours both have the same bug - they don't test to see if they've 'run off the end'

I guess I've never used mine with petabytes & yours at least has a pretty high top end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment