Skip to content

Instantly share code, notes, and snippets.

@emiller42
Last active November 24, 2015 16:40
Show Gist options
  • Save emiller42/c807f392adf024b33c21 to your computer and use it in GitHub Desktop.
Save emiller42/c807f392adf024b33c21 to your computer and use it in GitHub Desktop.
Splunk SPL-109514 Details

Splunk SPL-109514

Issue with scheduler on SHC around search quotas. "usage" metric continually increases until it surpasses role-based quota, blocking scheduled jobs.

Example

Affected users will display WARNs in splunkd.log like the following. Note: these messages can be legit, but volume is an indicator.

We have found that affected users generate the below warning every 7 seconds for each scheduled search they have.

11-18-2015 11:10:34.638 -0600 WARN  SHPMaster - Search not executed: Your maximum number of concurrent searches has been reached. usage=41 quota=30 user=someuser. for search: nobody;search;A Saved Search

Detection

Run the following as an alert over a 5-minute span:

index=_internal sourcetype=splunkd  component=SHPMaster "Search not executed: Your maximum number of concurrent searches has been reached" 
| rex "user\=(?<user>.+)\.\s+for search:\s(?<search_user>[^;]+);(?<search_context>[^;]+);(?<search_name>.+)" 
| fields _time usage quota user search_*  
| stats  count by user search_name 
| where count>40 
| stats values(search_name) as affected_searches by user

Fix:

When detected, rolling-restart the search head cluster.

Workaround:

Set all quotas to zero. Not sure if explicit role settings override imported roles, so we did the following for all roles, including builtin 'user', 'power', 'admin'

Ex: authorize.conf

[role_admin]
srchJobsQuota = 0
rtSrchJobsQuota = 0
cumulativeSrchJobsQuota = 0
cumulativeRTSrchJobsQuota = 0
# Splunk SPL-109514
Issue with scheduler on SHC around search quotas. "usage" metric continually increases until it surpasses role-based quota, blocking scheduled jobs.
# Example
Affected users will display WARNs in splunkd.log like the following. Note: these messages can be legit, but volume is an indicator.
We have found that affected users generate the below warning every 7 seconds for each scheduled search they have.
11-18-2015 11:10:34.638 -0600 WARN SHPMaster - Search not executed: Your maximum number of concurrent searches has been reached. usage=41 quota=30 user=someuser. for search: nobody;search;A Saved Search
## Detection
Run the following as an alert over a 5-minute span:
index=_internal sourcetype=splunkd component=SHPMaster "Search not executed: Your maximum number of concurrent searches has been reached"
| rex "user\=(?<user>.+)\.\s+for search:\s(?<search_user>[^;]+);(?<search_context>[^;]+);(?<search_name>.+)"
| fields _time usage quota user search_*
| stats count by user search_name
| where count>40
| stats values(search_name) as affected_searches by user
## Fix:
When detected, rolling-restart the search head cluster.
## Workaround:
Set all quotas to zero. Not sure if explicit role settings override imported roles, so we did the following for all roles, including builtin 'user', 'power', 'admin'
Ex: authorize.conf
[role_admin]
srchJobsQuota = 0
rtSrchJobsQuota = 0
cumulativeSrchJobsQuota = 0
cumulativeRTSrchJobsQuota = 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment