I have been using climate-indices
python packages years ago, and the tool developed by James Adam is awesome, it help us to calculate various climate indices. There are two indices that I frequently produced using this tool: SPI and SPEI.
I also wrote a guideline on how to calculate these two indices using this package https://bennyistanto.github.io/spi/.
When I wrote this guideline, I never have any problem during the indices calculation. Then I noticed some people contacted me to ask about some error related to Pearson fitted also in the Github repository there are lot of Issues submitted. And surprisingly, I also experiencing the similar error when I try to update the indices using latest data.
Well, I doesn't need the Pearson-based calculation anyway and I always use the gamma-fitted based SPI and SPEI. I am sure James and other contributors are working very hard to solve the various issue related to this Pearson computation, but this is not an easy problem to solve instantly.
So, can I skip the Pearson fitting process? The answer is Yes.
Please follow below procedure!
Note
Below steps has been tested using climate-indices 2.0.0
To enforce the usage of the Gamma distribution only and entirely prevent the Pearson distribution from being used, you would need to remove or modify the logic that handles the Pearson distribution within the spi
and spei
functions inside file indices.py
.
For the spi
function
Start the modifications right after the initial validations and before checking the distribution:
Original code snippet from your spi
function:
# reshape precipitation values to (years, 12) for monthly,
# or to (years, 366) for daily
if periodicity is compute.Periodicity.monthly:
values = utils.reshape_to_2d(values, 12)
elif periodicity is compute.Periodicity.daily:
values = utils.reshape_to_2d(values, 366)
else:
raise ValueError(f"Invalid periodicity argument: {periodicity}")
After the above code, directly continue with:
# Force the use of the Gamma distribution
if True: # Ignore the 'distribution' argument, always use Gamma
if fitting_params is not None:
alphas = fitting_params["alpha"]
betas = fitting_params["beta"]
else:
alphas = None
betas = None
values = compute.transform_fitted_gamma(
values,
data_start_year,
calibration_year_initial,
calibration_year_final,
periodicity,
alphas,
betas,
)
values = np.clip(values, _FITTED_INDEX_VALID_MIN, _FITTED_INDEX_VALID_MAX).flatten()
return values[0:original_length]
After this modification, the spi
function will be like below
def spi(
values: np.ndarray,
scale: int,
distribution: Distribution,
data_start_year: int,
calibration_year_initial: int,
calibration_year_final: int,
periodicity: compute.Periodicity,
fitting_params: Dict = None,
) -> np.ndarray:
"""
Computes SPI (Standardized Precipitation Index).
:param values: 1-D numpy array of precipitation values, in any units,
first value assumed to correspond to January of the initial year if
the periodicity is monthly, or January 1st of the initial year if daily
:param scale: number of time steps over which the values should be scaled
before the index is computed
:param distribution: distribution type to be used for the internal
fitting/transform computation
:param data_start_year: the initial year of the input precipitation dataset
:param calibration_year_initial: initial year of the calibration period
:param calibration_year_final: final year of the calibration period
:param periodicity: the periodicity of the time series represented by the
input data, valid/supported values are 'monthly' and 'daily'
'monthly' indicates an array of monthly values, assumed to span full
years, i.e. the first value corresponds to January of the initial year
and any missing final months of the final year filled with NaN values,
with size == # of years * 12
'daily' indicates an array of full years of daily values with 366 days
per year, as if each year were a leap year and any missing final months
of the final year filled with NaN values, with array size == (# years * 366)
:param fitting_params: optional dictionary of pre-computed distribution
fitting parameters, if the distribution is gamma then this dict should
contain two arrays, keyed as "alpha" and "beta", and if the
distribution is Pearson then this dict should contain four arrays keyed
as "prob_zero", "loc", "scale", and "skew".
:return SPI values fitted to the gamma distribution at the specified time
step scale, unitless
:rtype: 1-D numpy.ndarray of floats of the same length as the input array
of precipitation values
"""
# we expect to operate upon a 1-D array, so if we've been passed a 2-D array
# then we flatten it, otherwise raise an error
shape = values.shape
if len(shape) == 2:
values = values.flatten()
elif len(shape) != 1:
message = f"Invalid shape of input array: {shape} -- only 1-D and 2-D arrays are supported"
_logger.error(message)
raise ValueError(message)
# if we're passed all missing values then we can't compute
# anything, so we return the same array of missing values
if (np.ma.is_masked(values) and values.mask.all()) or np.all(np.isnan(values)):
return values
# clip any negative values to zero
if np.amin(values) < 0.0:
_logger.warning("Input contains negative values -- all negatives clipped to zero")
values = np.clip(values, a_min=0.0, a_max=None)
# remember the original length of the array, in order to facilitate
# returning an array of the same size
original_length = values.size
# get a sliding sums array, with each time step's value scaled
# by the specified number of time steps
values = compute.sum_to_scale(values, scale)
# reshape precipitation values to (years, 12) for monthly,
# or to (years, 366) for daily
if periodicity is compute.Periodicity.monthly:
values = utils.reshape_to_2d(values, 12)
elif periodicity is compute.Periodicity.daily:
values = utils.reshape_to_2d(values, 366)
else:
raise ValueError(f"Invalid periodicity argument: {periodicity}")
# if distribution is Distribution.gamma:
# # get (optional) fitting parameters if provided
# if fitting_params is not None:
# alphas = fitting_params["alpha"]
# betas = fitting_params["beta"]
# else:
# alphas = None
# betas = None
# # fit the scaled values to a gamma distribution
# # and transform to corresponding normalized sigmas
# values = compute.transform_fitted_gamma(
# values,
# data_start_year,
# calibration_year_initial,
# calibration_year_final,
# periodicity,
# alphas,
# betas,
# )
# elif distribution is Distribution.pearson:
# # get (optional) fitting parameters if provided
# if fitting_params is not None:
# probabilities_of_zero = fitting_params["prob_zero"]
# locs = fitting_params["loc"]
# scales = fitting_params["scale"]
# skews = fitting_params["skew"]
# else:
# probabilities_of_zero = None
# locs = None
# scales = None
# skews = None
# # fit the scaled values to a Pearson Type III distribution
# # and transform to corresponding normalized sigmas
# values = compute.transform_fitted_pearson(
# values,
# data_start_year,
# calibration_year_initial,
# calibration_year_final,
# periodicity,
# probabilities_of_zero,
# locs,
# scales,
# skews,
# )
# else:
# message = f"Unsupported distribution argument: {distribution}"
# _logger.error(message)
# raise ValueError(message)
# Force the use of the Gamma distribution
if True: # Ignore the 'distribution' argument, always use Gamma
if fitting_params is not None:
alphas = fitting_params["alpha"]
betas = fitting_params["beta"]
else:
alphas = None
betas = None
values = compute.transform_fitted_gamma(
values,
data_start_year,
calibration_year_initial,
calibration_year_final,
periodicity,
alphas,
betas,
)
# clip values to within the valid range, reshape the array back to 1-D
values = np.clip(values, _FITTED_INDEX_VALID_MIN, _FITTED_INDEX_VALID_MAX).flatten()
# return the original size array
return values[0:original_length]
For the spei
function:
Start the modifications right after you prepare the scaled_values
variable:
Original code snippet from your spei
function:
# get a sliding sums array, with each element's value
# scaled by the specified number of time steps
scaled_values = compute.sum_to_scale(p_minus_pet, scale)
Immediately following this, continue with:
# Force the use of the Gamma distribution
if True: # Ignore the 'distribution' argument, always use Gamma
if fitting_params is not None:
alphas = fitting_params["alpha"]
betas = fitting_params["beta"]
else:
alphas = None
betas = None
transformed_fitted_values = compute.transform_fitted_gamma(
scaled_values,
data_start_year,
calibration_year_initial,
calibration_year_final,
periodicity,
alphas,
betas,
)
values = np.clip(transformed_fitted_values, _FITTED_INDEX_VALID_MIN, _FITTED_INDEX_VALID_MAX).flatten()
return values[0:original_length]
After this modification, the spei
function will be like below
def spei(
precips_mm: np.ndarray,
pet_mm: np.ndarray,
scale: int,
distribution: Distribution,
periodicity: compute.Periodicity,
data_start_year: int,
calibration_year_initial: int,
calibration_year_final: int,
fitting_params: dict = None,
) -> np.ndarray:
"""
Compute SPEI fitted to the specified distribution.
PET values are subtracted from the precipitation values to come up with an array
of (P - PET) values, which is then scaled to the specified months scale and
finally fitted/transformed to SPEI values corresponding to the input
precipitation time series.
:param precips_mm: an array of monthly total precipitation values,
in millimeters, should be of the same size (and shape?) as the input PET array
:param pet_mm: an array of monthly PET values, in millimeters,
should be of the same size (and shape?) as the input precipitation array
:param scale: the number of months over which the values should be scaled
before computing the indicator
:param distribution: distribution type to be used for the internal
fitting/transform computation
:param periodicity: the periodicity of the time series represented by the
input data, valid/supported values are 'monthly' and 'daily'
'monthly' indicates an array of monthly values, assumed to span full
years, i.e. the first value corresponds to January of the initial year
and any missing final months of the final year filled with NaN values,
with size == # of years * 12
'daily' indicates an array of full years of daily values with 366 days
per year, as if each year were a leap year and any missing final months
of the final year filled with NaN values, with array size == (# years * 366)
:param data_start_year: the initial year of the input datasets (assumes that
the two inputs cover the same period)
:param calibration_year_initial: initial year of the calibration period
:param calibration_year_final: final year of the calibration period
:param fitting_params: optional dictionary of pre-computed distribution
fitting parameters, if the distribution is gamma then this dict should
contain two arrays, keyed as "alpha" and "beta", and if the
distribution is Pearson then this dict should contain four arrays keyed
as "prob_zero", "loc", "scale", and "skew"
Older keys such as "alphas" and "probabilities_of_zero" are deprecated.
:return: an array of SPEI values
:rtype: numpy.ndarray of type float, of the same size and shape as the input
PET and precipitation arrays
"""
# Normalize fitting param keys
fitting_params = _norm_fitdict(fitting_params)
# if we're passed all missing values then we can't compute anything,
# so we return the same array of missing values
if (np.ma.is_masked(precips_mm) and precips_mm.mask.all()) or np.all(np.isnan(precips_mm)):
return precips_mm
# validate that the two input arrays are compatible
if precips_mm.size != pet_mm.size:
message = "Incompatible precipitation and PET arrays"
_logger.error(message)
raise ValueError(message)
# clip any negative values to zero
if np.amin(precips_mm) < 0.0:
_logger.warning("Input contains negative values -- all negatives clipped to zero")
precips_mm = np.clip(precips_mm, a_min=0.0, a_max=None)
# subtract the PET from precipitation, adding an offset
# to ensure that all values are positive
p_minus_pet = (precips_mm.flatten() - pet_mm.flatten()) + 1000.0
# remember the original length of the input array, in order to facilitate
# returning an array of the same size
original_length = precips_mm.size
# get a sliding sums array, with each element's value
# scaled by the specified number of time steps
scaled_values = compute.sum_to_scale(p_minus_pet, scale)
# if distribution is Distribution.gamma:
# # get (optional) fitting parameters if provided
# if fitting_params is not None:
# alphas = fitting_params["alpha"]
# betas = fitting_params["beta"]
# else:
# alphas = None
# betas = None
# # fit the scaled values to a gamma distribution and
# # transform to corresponding normalized sigmas
# transformed_fitted_values = compute.transform_fitted_gamma(
# scaled_values,
# data_start_year,
# calibration_year_initial,
# calibration_year_final,
# periodicity,
# alphas,
# betas,
# )
# elif distribution is Distribution.pearson:
# # get (optional) fitting parameters if provided
# if fitting_params is not None:
# probabilities_of_zero = fitting_params["prob_zero"]
# locs = fitting_params["loc"]
# scales = fitting_params["scale"]
# skews = fitting_params["skew"]
# else:
# probabilities_of_zero = None
# locs = None
# scales = None
# skews = None
# # fit the scaled values to a Pearson Type III distribution
# # and transform to corresponding normalized sigmas
# transformed_fitted_values = compute.transform_fitted_pearson(
# scaled_values,
# data_start_year,
# calibration_year_initial,
# calibration_year_final,
# periodicity,
# probabilities_of_zero,
# locs,
# scales,
# skews,
# )
# else:
# message = "Unsupported distribution argument: " + "{dist}".format(dist=distribution)
# _logger.error(message)
# raise ValueError(message)
# Force the use of the Gamma distribution
if True: # Ignore the 'distribution' argument, always use Gamma
if fitting_params is not None:
alphas = fitting_params["alpha"]
betas = fitting_params["beta"]
else:
alphas = None
betas = None
transformed_fitted_values = compute.transform_fitted_gamma(
scaled_values,
data_start_year,
calibration_year_initial,
calibration_year_final,
periodicity,
alphas,
betas,
)
# clip values to within the valid range, reshape the array back to 1-D
values = np.clip(transformed_fitted_values, _FITTED_INDEX_VALID_MIN, _FITTED_INDEX_VALID_MAX).flatten()
# return the original size array
return values[0:original_length]
These modifications will ensure that the spi
and spei
functions exclusively use the Gamma distribution, completely bypassing the code block that handles the Pearson distribution. This approach directly integrates into your original function flow and respects all existing data validations and setups.
The code snippet from __main__.py
indicates that the spi
and spei
functions are being called with the distribution
parameter explicitly passed from the parameters
dictionary.
Please go to line 1101-1123
def _spi(precips, parameters):
return indices.spi(
values=precips,
scale=parameters["scale"],
distribution=parameters["distribution"],
data_start_year=parameters["data_start_year"],
calibration_year_initial=parameters["calibration_year_initial"],
calibration_year_final=parameters["calibration_year_final"],
periodicity=parameters["periodicity"],
)
def _spei(precips, pet_mm, parameters):
return indices.spei(
precips_mm=precips,
pet_mm=pet_mm,
scale=parameters["scale"],
distribution=parameters["distribution"],
data_start_year=parameters["data_start_year"],
calibration_year_initial=parameters["calibration_year_initial"],
calibration_year_final=parameters["calibration_year_final"],
periodicity=parameters["periodicity"],
)
This setup is likely where the Pearson distribution option is getting included. To ensure that the Gamma distribution is always used regardless of the command line input or other configurations, you can modify these function calls in __main__.py
to ignore the distribution
setting from parameters
and hard-code it to use Distribution.gamma
. Here's how you can adjust your __main__.py
functions:
def _spi(precips, parameters):
return indices.spi(
values=precips,
scale=parameters["scale"],
distribution=indices.Distribution.gamma, # Force Gamma distribution
data_start_year=parameters["data_start_year"],
calibration_year_initial=parameters["calibration_year_initial"],
calibration_year_final=parameters["calibration_year_final"],
periodicity=parameters["periodicity"],
)
def _spei(precips, pet_mm, parameters):
return indices.spei(
precips_mm=precips,
pet_mm=pet_mm,
scale=parameters["scale"],
distribution=indices.Distribution.gamma, # Force Gamma distribution
data_start_year=parameters["data_start_year"],
calibration_year_initial=parameters["calibration_year_initial"],
calibration_year_final=parameters["calibration_year_final"],
periodicity=parameters["periodicity"],
)
This modification:
- Hard-codes the
distribution
parameter: This ensures that even if theparameters
dictionary has a different distribution set (like Pearson), your function calls will always use the Gamma distribution. - Overrides any external configuration: By ignoring the
distribution
key from theparameters
dictionary and setting it directly in the function calls, you prevent any external script or configuration from changing this behavior.
Next is modifying some lines inside def main()
.
Please go to line 1593-1621
# compute SPI if specified
if arguments.index in ["spi", "scaled", "all"]:
# prepare precipitation NetCDF in case dimensions not (lat, lon, time) or if any coordinates are descending
netcdf_precip = _prepare_file(arguments.netcdf_precip, arguments.var_name_precip)
# run SPI computations for each scale/distribution in turn
for scale in arguments.scales:
for dist in indices.Distribution:
# keyword arguments used for the SPI function
kwrgs = {
"index": "spi",
"netcdf_precip": netcdf_precip,
"var_name_precip": arguments.var_name_precip,
"input_type": input_type,
"scale": scale,
"distribution": dist,
"periodicity": arguments.periodicity,
"calibration_start_year": arguments.calibration_start_year,
"calibration_end_year": arguments.calibration_end_year,
"output_file_base": arguments.output_file_base,
"chunksizes": arguments.chunksizes,
}
# compute and write SPI
_compute_write_index(kwrgs)
# remove temporary file if one was created
if netcdf_precip != arguments.netcdf_precip:
os.remove(netcdf_precip)
and line 1648-1680
if arguments.index in ["spei", "scaled", "all"]:
# prepare NetCDFs in case dimensions not (lat, lon, time) or if any coordinates are descending
netcdf_precip = _prepare_file(arguments.netcdf_precip, arguments.var_name_precip)
netcdf_pet = _prepare_file(arguments.netcdf_pet, arguments.var_name_pet)
# run SPEI computations for each scale/distribution in turn
for scale in arguments.scales:
for dist in indices.Distribution:
# keyword arguments used for the SPI function
kwrgs = {
"index": "spei",
"netcdf_precip": netcdf_precip,
"var_name_precip": arguments.var_name_precip,
"netcdf_pet": netcdf_pet,
"var_name_pet": arguments.var_name_pet,
"input_type": input_type,
"scale": scale,
"distribution": dist,
"periodicity": arguments.periodicity,
"calibration_start_year": arguments.calibration_start_year,
"calibration_end_year": arguments.calibration_end_year,
"output_file_base": arguments.output_file_base,
"chunksizes": arguments.chunksizes,
}
# compute and write SPEI
_compute_write_index(kwrgs)
# remove temporary file if one was created
if netcdf_precip != arguments.netcdf_precip:
os.remove(netcdf_precip)
if netcdf_pet != arguments.netcdf_pet:
os.remove(netcdf_pet)
You should adjust the loop where dist in indices.Distribution
is iterated over. Currently, this loop iterates over all members of the Distribution
enum, which includes both Gamma and Pearson distributions. This loop is a key point where the distribution type is being set for the computations, leading to both being executed.
To enforce the use of only the Gamma distribution and avoid inadvertently triggering Pearson computations, you can modify this loop to only iterate over the Gamma distribution. However, since indices.Distribution.gamma
is not an iterable, you would use a list containing only indices.Distribution.gamma
:
Change for dist in indices.Distribution:
into for dist in [indices.Distribution.gamma]:
in line 1600 and 1655, after modification it will be like below:
Line 1593-1621
# compute SPI if specified
if arguments.index in ["spi", "scaled", "all"]:
# prepare precipitation NetCDF in case dimensions not (lat, lon, time) or if any coordinates are descending
netcdf_precip = _prepare_file(arguments.netcdf_precip, arguments.var_name_precip)
# run SPI computations for each scale/distribution in turn
for scale in arguments.scales:
for dist in [indices.Distribution.gamma]:
# keyword arguments used for the SPI function
kwrgs = {
"index": "spi",
"netcdf_precip": netcdf_precip,
"var_name_precip": arguments.var_name_precip,
"input_type": input_type,
"scale": scale,
"distribution": dist,
"periodicity": arguments.periodicity,
"calibration_start_year": arguments.calibration_start_year,
"calibration_end_year": arguments.calibration_end_year,
"output_file_base": arguments.output_file_base,
"chunksizes": arguments.chunksizes,
}
# compute and write SPI
_compute_write_index(kwrgs)
# remove temporary file if one was created
if netcdf_precip != arguments.netcdf_precip:
os.remove(netcdf_precip)
Line 1648-1680
if arguments.index in ["spei", "scaled", "all"]:
# prepare NetCDFs in case dimensions not (lat, lon, time) or if any coordinates are descending
netcdf_precip = _prepare_file(arguments.netcdf_precip, arguments.var_name_precip)
netcdf_pet = _prepare_file(arguments.netcdf_pet, arguments.var_name_pet)
# run SPEI computations for each scale/distribution in turn
for scale in arguments.scales:
for dist in [indices.Distribution.gamma]:
# keyword arguments used for the SPI function
kwrgs = {
"index": "spei",
"netcdf_precip": netcdf_precip,
"var_name_precip": arguments.var_name_precip,
"netcdf_pet": netcdf_pet,
"var_name_pet": arguments.var_name_pet,
"input_type": input_type,
"scale": scale,
"distribution": dist,
"periodicity": arguments.periodicity,
"calibration_start_year": arguments.calibration_start_year,
"calibration_end_year": arguments.calibration_end_year,
"output_file_base": arguments.output_file_base,
"chunksizes": arguments.chunksizes,
}
# compute and write SPEI
_compute_write_index(kwrgs)
# remove temporary file if one was created
if netcdf_precip != arguments.netcdf_precip:
os.remove(netcdf_precip)
if netcdf_pet != arguments.netcdf_pet:
os.remove(netcdf_pet)
By using [indices.Distribution.gamma]
:
- You limit the loop to only consider the Gamma distribution.
- This change prevents any chance of Pearson being used within this part of the script, regardless of external configurations or command line arguments.
This adjustment will ensure that when the _compute_write_index
function (or any similar function that eventually calls your spi
or spei
functions) is invoked, it always uses the Gamma distribution, which matches your requirements to avoid Pearson calculations.
Let's take a look inside __spi__.py
now.
Based on the content of __spi__.py
, the following modification should help to enforce the use of the Gamma distribution only during SPI calculations. This involves changing the distribution loop so it does not iterate over both Gamma and Pearson distributions.
Please go to line 642-647. Here's the relevant code snippet that needs revision:
...
for scale in keyword_arguments["scales"]:
for distribution in [indices.Distribution.gamma, indices.Distribution.pearson]: # This line needs modification
_logger.info(
f"Computing {scale}-{keyword_arguments['periodicity'].unit()} "
f"SPI ({distribution.value.capitalize()})",
)
...
To enforce only the Gamma distribution, modify the loop to iterate only over the Gamma distribution:
...
for scale in keyword_arguments["scales"]:
for distribution in [indices.Distribution.gamma]: # Force to only Gamma
_logger.info(
f"Computing {scale}-{keyword_arguments['periodicity'].unit()} "
f"SPI (Gamma)",
)
...
This change will prevent the loop from entering the Pearson distribution block and ensure that only Gamma-related computations are performed.
After making this change, ensure to test the script thoroughly in your environment to confirm that only the Gamma distribution is used and that there are no unintended side effects or errors due to these modifications.