Skip to content

Instantly share code, notes, and snippets.

@evenkiel
Last active November 9, 2022 19:21
Show Gist options
  • Save evenkiel/a05d3dbdc29b17ac2284 to your computer and use it in GitHub Desktop.
Save evenkiel/a05d3dbdc29b17ac2284 to your computer and use it in GitHub Desktop.
SQL Server Performance Troubleshooting - Class Notes Day 1

SQL Server Performance Tuning Class - Day 1

Denver 2015

Getting ready steps

  • Write down 3 biggest pain points
    • "IS" it the database that's slowing things down or environmental? Is perfmon the answer?
    • Deadlocks / locking. Should we be running in a differnet isolation mode?
    • Audit solution - triggers vs in-app, etc.
  • Gather SQL server stats
  • Gather Wait statistics

Day One

How To Triage

  • Batch Requests per Second (measure of workload)
  • "SQL Query Stress" tool
  • rollback will give you a % complete status while 'kill' will not
  • rollback only ever executes on a single thread, regardless of DOP
  • sp_WhoIsActive only shows you the last command in the transaction in the session
    • make sure you're using @get_locks command to see the locks held by the session
  • sp_AskBrent will diagnose whether or not there's really a problem with the SQL Server
  • SOS_Scheduler_Yield points to a CPU issue
  • Page IO Latch should return in less than 100ms otherwise there may be a storage issue
  • exec sp_configure 'remote admin connections', 1 This will enable Direct Admin Connections
  • sp_BlitzCache
    • cpu weight is a measure of cpu load
    • focus on read weight to see who is doing the most work
    • exec sp_BlitzCache @top = 10, @sort_order = 'reads' Will also run sort by 'executions'
  • Run sp_BlitzCache once per week and review to get a feel for what the top queries should be
  • Triage tools in order
    1. Monitoring software (flatline)
    2. sp_WhoIsActive
    3. sp_AskBrent
    4. sp_BlitzCache
    5. sp_BlitzIndex
    6. sp_Blitz

T-Sql Anti Patterns

  • set statistics io on
  • when using profiler, set 75 millisconds as a threshold to filter out the small statements
  • Extended Events - uses event tracing for windows. lighter weight than a server side trace
  • Single Statement Table Value Function - SQL Server can optimize and process this much more efficiently because SQL will inline the function
  • SQL Server 2014 estimates that 100 rows will come back from a TVF. 2012 and earlier return 1.
  • By pushing data into a temp table first, statistics are going to be accurate
  • Look for usages of fn_split in our larger procedures. SQL server does a terrible job of estimating when this function is involved
  • DON'T JOIN TO TABLE VALUE FUNCTIONS (or table variables for that matter). use temp tables instead
  • CTEs are executed every time they are referenced.
  • Dynamic SQL can result in a large plan cache if you're not using parameters properly
  • SARG (search argumentable). non-SARGable means it won't use an index. this can happen with optional sproc parameters
  • WHERE LEFT, UPPER, LIKE, YEAR, VARCHAR = @NVARCHAR all of these result in a non-sargable queries. all of these will result in full table scans
  • Correlated subquery issues:
    • has some of the same problems as scalar functions
    • can use windowing functions ('OVER') instead
  • Nested Views - can cause performance issues. Be careful

Understanding Parallelism

  • CXPACKET means there were inefficiencies in how work was handed out
  • Cost Threshold for Parallelism - default of 5 is typically too low these days starting point should be 50 or 100.
  • Amdahl's law - adding more workers becomes counter productive at some point
  • CXPACKET should be below 50%, but don't drive to 0
    • usually means there's missing indexes or query opportunities
    • this ensures that only big queries go parallel
    • this is not fixed by setting max DOP, you need to index and tune queries
  • Don't look at %s when looking at wait types, look at clock time
  • Monitoring software will measure hours of wait time per hour. Very useful

Parameter Sniffing

  • "Density Vector" average number of rows for any given value
  • If an execution plan gets seeded with a small number of estimated rows, then when the sproc runs again with different parameters it won't necessarily use statistics to find the plan to run
  • Look on the actual execution plan XML and you'll see ParameterCompiledValue differences from ParameterRuntimeValue. If it's zero, then it's not using statistics it's just using the cached item.
  • With procedures, look for differences in the avg duration between statements and the procedure. This could be an indication that parameter sniffing is an issue.
  • stored procedures use statistics differently from direct statements. This can be a real issue when you're trying to trace down perf issues with stored procs inside of mgmt studio.
  • `exec sp_helpstats 'dbo.Posts', 'All'``
  • DBCC SHOW_STATISTICS('Posts', 'kl_posts_OwnerUserId')
    • Steps will show you the number of histogram blocks. 200 is the max, 201 steps with nulls
  • If you use a local variable in a sproc then it uses a different set of statistics - uses density vector calculations instead. Talked without someone else at the conference who said they used this across all their sprocs to avoid parameter sniffing issues. This could be used instead of WITH RECOMPILE
  • You won't be able to reproduce the problem oustide of a sproc
  • INSTEAD, create a temp sproc to find out what's going on...
  • The WITH RECOMPILE option, the exec plan will be re-compiled and then not stored in cache
    • Note that this only applies to the outer most statement
    • Rather than 'WITH RECOMPILE' at the sproc level, you should use 'OPTION (RECOMPILE)' at the statement level within the sproc
    • DON'T put RECOMPILE hints in the header of the sproc
  • `OPTION (OPTIMIZE FOR UNKNOWN)`` as an alternative to recompile. Forces it to caculate estimated rows based on averages
  • Local variables inside a sproc CANNOT be sniffed
  • `OPTION OPTIMIZE FOR (@UserId=557499)`` this will give you a consistent execution plan
  • Another option would be to have different versions of the sproc for small / large customers
  • Prepared statements act just like spocs
  • NonClustered indexes always include the clustered index key
  • When testing slow running sprocs, create a temp sproc, look at the 'compiled for' value from the slow running instance and then OPTIMIZE FOR that value.
  • Updating statistics forces a recompile. This doesn't mean that the stats were bad, it could be that the cached plan was bad.

Find and Fight Blocking

  • Read committed is the current default pessimistic isolation level in the boxed version of sql server
  • Use monitoring tools or perfmon alerts to identify blocking issues
  • Create alert in SQL Agent - General Statistics:Processes Blocked. You can also set the response values to run sp_WhoIsActive when this happens. Make sure you set a soak time of like 5 minutes to prevent multiple alerts from coming in
  • LCK_M_IX, LCK_M_S, LCK_M_U, LCK_M_IS - locks are either row or object locks
  • LCK_M_SCH_S, LCK_M_SCH_M - Index rebuilds will also grab a schema lock when indexing occurs
    • ALTER TABLE DISABLE TRIGGER - nobody can use the table while you're running
  • Blocked process report (look for links in materials) can help you visualize the blocking chains every 5 seconds
  • Deadlock monitor runs every 5 seconds. Also reports on blocking that can be picked up by a trace which can be picked up by extended events.
  • 'Lock Escalation' SQL Server will escalate to a larger scoped lock once you cross a certain threshold. If you do any type of clustered scan there's a risk that the lock could escalate
  • Look at wait stats to see what types of locking are occuring
    • Identify the root of the blocking chain
  • Optimistic Concurrency
    • Read Committed Snapshot Isolation (SNAPSHOT)
      • most recent committed version at the statement level
    • Snapshot Isolation
      • transactionally consistent within a transaction
    • Use Snapshot isolation for reports can help without blocking the rest of the application. Make the big read only reports use snapshot isoluation. Best to use only on big read queries, not updates
  • You can run RCSI and SNAPSHOT isolation together. SNAPSHOT is great for long running reports, just be sure to wrap the logic for the report in a transaction and set SNAPSHOT isolation
    • With RCSI and SNAPSHOT, performance of tempdb becomes vital.
    • Monitor tempdb perf and look for abandoned transactions (look back through notes on how to do this)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment