Skip to content

Instantly share code, notes, and snippets.

@edvardm
Created February 21, 2018 14:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save edvardm/9ac7b2d5e65bc53e3ccc70f889bc4a2b to your computer and use it in GitHub Desktop.
Save edvardm/9ac7b2d5e65bc53e3ccc70f889bc4a2b to your computer and use it in GitHub Desktop.
Simple stats module
from statistics import mean, median_low, stdev, mode
def stat(lst):
if not lst:
return None
_min = min(lst)
_max = max(lst)
_mean = mean(lst)
return {
'mean': _mean,
'median': median_low(lst),
'mode': mode(lst),
'min': _min,
'max': _max,
'stdev': stdev(lst, _mean),
'n': len(lst),
'range': _max - _min
}
def fmt_stat(lst):
values = stat(lst)
if not values:
return 'N/A, empty data set'
return 'n: {n}, mean/stdev: {mean:.3f}/{stdev:.2f}, median/mode: {median}/{mode}, min/max: {min}/{max}, range: {range}'.format(**values)
# Example usage:
# lst = [1, 2, 6, 2, 5, 2, 6, 7, 23, 5, 31, 11, 3]
# print(fmt_stat(lst)) # => n: 13, mean/stdev: 8.000/9.00, median/mode: 5/2, min/max: 1/31, range: 30
@edvardm
Copy link
Author

edvardm commented Feb 21, 2018

By no means this is super efficient nor otherwise elegant. Check out Numpy if you want to do heavy lifting /w Python & stats.

Also choosing median_low instead of other options median_high and median: the latter is bad in a sense that one nice property of median is that if you don't use average for even number of datapoints, you can be certain that result is actually a real datapoint -- and with cardinals that could be pretty important (you don't want to fit 2.7 humans somewhere). The choice between high and low was purely random, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment