Issue with scheduler on SHC around search quotas. "usage" metric continually increases until it surpasses role-based quota, blocking scheduled jobs.
Affected users will display WARNs in splunkd.log like the following. Note: these messages can be legit, but volume is an indicator.
We have found that affected users generate the below warning every 7 seconds for each scheduled search they have.
11-18-2015 11:10:34.638 -0600 WARN SHPMaster - Search not executed: Your maximum number of concurrent searches has been reached. usage=41 quota=30 user=someuser. for search: nobody;search;A Saved Search
Run the following as an alert over a 5-minute span:
index=_internal sourcetype=splunkd component=SHPMaster "Search not executed: Your maximum number of concurrent searches has been reached"
| rex "user\=(?<user>.+)\.\s+for search:\s(?<search_user>[^;]+);(?<search_context>[^;]+);(?<search_name>.+)"
| fields _time usage quota user search_*
| stats count by user search_name
| where count>40
| stats values(search_name) as affected_searches by user
When detected, rolling-restart the search head cluster.
Set all quotas to zero. Not sure if explicit role settings override imported roles, so we did the following for all roles, including builtin 'user', 'power', 'admin'
Ex: authorize.conf
[role_admin]
srchJobsQuota = 0
rtSrchJobsQuota = 0
cumulativeSrchJobsQuota = 0
cumulativeRTSrchJobsQuota = 0