Skip to content

Instantly share code, notes, and snippets.

@salotz
Created January 9, 2020 18:16
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save salotz/0158a99a75078b47538452111ec0faa2 to your computer and use it in GitHub Desktop.
Save salotz/0158a99a75078b47538452111ec0faa2 to your computer and use it in GitHub Desktop.
Clean numpy-only implementation of the Shimazaki-Shinimoto histogram binning method as a function.
def shimazaki_shinimoto_binning(x, min_bins, max_bins):
"""The Shimazaki-Shinimoto histogram binning algorithm for choosing an
optimal constant number of bins.
Parameters
----------
x : array of int or float
The data you want to bin
min_bins : int
The minimum number of bins to consider for optimization. Must
be at least 1.
max_bins : int
The maximum number of bins to consider for optimization.
Returns
-------
optimal_bin_size : float
The optimal bin size calculated
bin_edges : array of float
The bin edges for the data using the optimal bin size
Notes
-----
Created on Thu Oct 25 11:32:47 2012
Histogram Binwidth Optimization Method
Shimazaki and Shinomoto, Neural Comput 19 1503-1527, 2007
2006 Author Hideaki Shimazaki, Matlab
Department of Physics, Kyoto University
shimazaki at ton.scphys.kyoto-u.ac.jp
Please feel free to use/modify this program.
Version in python adapted Érbet Almeida Costa
Bugfix by Takuma Torii 2.24.2013
Adapted as a library function and commented etc. by Samuel D. Lotz
2020-1-9 from:
http://176.32.89.45/~hideaki/res/histogram.html#Python1D
archived at:
https://web.archive.org/save/http://176.32.89.45/~hideaki/res/histogram.html#Python1D
References
----------
@article{Shimazaki_2007,
doi = {10.1162/neco.2007.19.6.1503},
url = {https://doi.org/10.1162%2Fneco.2007.19.6.1503},
year = 2007,
month = {jun},
publisher = {{MIT} Press - Journals},
volume = {19},
number = {6},
pages = {1503--1527},
author = {Hideaki Shimazaki and Shigeru Shinomoto},
title = {A Method for Selecting the Bin Size of a Time Histogram},
journal = {Neural Computation}
}
"""
import numpy as np
assert min_bins > 1, "must be more than 1 bin"
x_max = np.max(x)
x_min = np.min(x)
# The minimum and maximum number of bins will determine the number
# of trials in between them. This will go into the construction of
# the cost function to determine the optimal bin size
trials_num_bins = np.arange(min_bins, max_bins)
num_trials = len(trials_num_bins)
# compute the bin size for each trial based on the min and max of
# the data
trials_bin_sizes = (x_max - x_min) / trials_num_bins
# then for each trial we will have a cost value associated with it
cost_vector = []
# Computation of the cost values for each bin size trial
for trial_idx, num_bins in enumerate(trials_num_bins):
num_edges = num_bins[trial_idx] + 1
# bin edges are linearly spaced for this range
edges = np.linspace(x_min, x_max, num_edges)
# count number of events in bins
trial_hist, _ = np.histogram(x, bins=edges)
# mean of event count
mean_count = np.mean(trial_hist)
# variance of event count
var_count = sum((trial_hist - mean_count)**2) / num_bins[trial_idx]
# compute the cost for this trial
cost = (2 * mean_count - var_count) /\
(trials_bin_sizes[trial_idx] ** 2)
cost_vector.append(cost)
# Optimal Bin Size Selection
# just find the minimal cost
min_idx = np.argmin(cost_vector)
min_cost = cost_vector[min_idx]
# then choose the number of bins associated with that
optimal_bin_size = trials_bin_sizes[min_idx]
edges = np.arange(x_min, x_max, num_bins[min_idx]+1)
return optimal_bin_size, edges
@AstroPierre
Copy link

AstroPierre commented Mar 12, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment