emiller42/gist:c807f392adf024b33c21

## gistfile1.markdown

      
    Raw
  

              gistfile1.markdown
            
          
    Splunk SPL-109514

Issue with scheduler on SHC around search quotas.  "usage" metric continually increases until it surpasses role-based quota, blocking scheduled jobs.
Example

Affected users will display WARNs in splunkd.log like the following.  Note: these messages can be legit, but volume is an indicator.
We have found that affected users generate the below warning every 7 seconds for each scheduled search they have.
11-18-2015 11:10:34.638 -0600 WARN  SHPMaster - Search not executed: Your maximum number of concurrent searches has been reached. usage=41 quota=30 user=someuser. for search: nobody;search;A Saved Search

Detection

Run the following as an alert over a 5-minute span:
index=_internal sourcetype=splunkd  component=SHPMaster "Search not executed: Your maximum number of concurrent searches has been reached" 
| rex "user\=(?<user>.+)\.\s+for search:\s(?<search_user>[^;]+);(?<search_context>[^;]+);(?<search_name>.+)" 
| fields _time usage quota user search_*  
| stats  count by user search_name 
| where count>40 
| stats values(search_name) as affected_searches by user

Fix:

When detected, rolling-restart the search head cluster.
Workaround:

Set all quotas to zero.  Not sure if explicit role settings override imported roles, so we did the following for all roles, including builtin 'user', 'power', 'admin'
Ex:  authorize.conf
[role_admin]
srchJobsQuota = 0
rtSrchJobsQuota = 0
cumulativeSrchJobsQuota = 0
cumulativeRTSrchJobsQuota = 0


## gistfile1.txt
# Splunk SPL-109514

Issue with scheduler on SHC around search quotas.  "usage" metric continually increases until it surpasses role-based quota, blocking scheduled jobs.

# Example

Affected users will display WARNs in splunkd.log like the following.  Note: these messages can be legit, but volume is an indicator.

We have found that affected users generate the below warning every 7 seconds for each scheduled search they have.

    11-18-2015 11:10:34.638 -0600 WARN  SHPMaster - Search not executed: Your maximum number of concurrent searches has been reached. usage=41 quota=30 user=someuser. for search: nobody;search;A Saved Search

## Detection

Run the following as an alert over a 5-minute span:

    index=_internal sourcetype=splunkd  component=SHPMaster "Search not executed: Your maximum number of concurrent searches has been reached"
    | rex "user\=(?<user>.+)\.\s+for search:\s(?<search_user>[^;]+);(?<search_context>[^;]+);(?<search_name>.+)"
    | fields _time usage quota user search_*
    | stats  count by user search_name
    | where count>40
    | stats values(search_name) as affected_searches by user

## Fix:
When detected, rolling-restart the search head cluster.

## Workaround:
Set all quotas to zero.  Not sure if explicit role settings override imported roles, so we did the following for all roles, including builtin 'user', 'power', 'admin'

Ex:  authorize.conf

    [role_admin]
    srchJobsQuota = 0
    rtSrchJobsQuota = 0
    cumulativeSrchJobsQuota = 0
    cumulativeRTSrchJobsQuota = 0
	# Splunk SPL-109514

	Issue with scheduler on SHC around search quotas. "usage" metric continually increases until it surpasses role-based quota, blocking scheduled jobs.

	# Example

	Affected users will display WARNs in splunkd.log like the following. Note: these messages can be legit, but volume is an indicator.

	We have found that affected users generate the below warning every 7 seconds for each scheduled search they have.

	11-18-2015 11:10:34.638 -0600 WARN SHPMaster - Search not executed: Your maximum number of concurrent searches has been reached. usage=41 quota=30 user=someuser. for search: nobody;search;A Saved Search

	## Detection

	Run the following as an alert over a 5-minute span:

	index=_internal sourcetype=splunkd component=SHPMaster "Search not executed: Your maximum number of concurrent searches has been reached"
	\| rex "user\=(?<user>.+)\.\s+for search:\s(?<search_user>[^;]+);(?<search_context>[^;]+);(?<search_name>.+)"
	\| fields _time usage quota user search_*
	\| stats count by user search_name
	\| where count>40
	\| stats values(search_name) as affected_searches by user

	## Fix:
	When detected, rolling-restart the search head cluster.

	## Workaround:
	Set all quotas to zero. Not sure if explicit role settings override imported roles, so we did the following for all roles, including builtin 'user', 'power', 'admin'

	Ex: authorize.conf

	[role_admin]
	srchJobsQuota = 0
	rtSrchJobsQuota = 0
	cumulativeSrchJobsQuota = 0
	cumulativeRTSrchJobsQuota = 0