mahemoff/tuning.md

## tuning.md

      
    Raw
  

              tuning.md
            
          
    A few general tips to deal with slow performance and hanging. Target system is Ubunty 20.04 and MySQL 8.0, but applicable to other configurations.
Relies on some functions in https://github.com/mahemoff/dotfiles
Slow log

MySQL can log slow queries. Enable it in /etc/mysql/mysql.conf.d/mysqld.cnf.
slow_query_log = 1
slow_query_log_file = /var/log/mysql/mysql-slow.log
long_query_time = 5 # How many seconds is considered a "slow" query, i.e. one that should be logged
#log-queries-not-using-indexes

It's convenient to monitor this while interacting with the website:
tail -f /var/log/mysq/mysql-slow.log

Hanging threads

Another way to monitor SQL is to check long-running threads. Use the mytaillong query for this.
Emergency measures to auto-restart

You can use the mykillall command to automatically kill long-running threads. You can keep this running indefinitely as follows:
while [ 1 ] ; do mykillall; sleep 120; done

As an emergency measure, if mysql is frequently hanging due to an unidentified long query or leak, you can periodically restart it, along with apache.
while [ 1 ] ; echo "$(date) restarting mysql"; do sudo service mysql restart; echo "$(date) restarting apache"; sudo service apache restart; sleep 120; done

Alternatively, you can run this as a cron job, e.g. to run every second minute:
*/2 * * * * echo "$(date) restarting mysql" >> /tmp/mysql_restart.log; service mysql restart; echo "$(date) restarting apache" >> /tmp/mysql_restart.log ; service apache2 restart

in which case you might want to verify it's running by monitoring the log file:
tail -f /tmp/mysql_restart.log

Note this will break any long-running web requests. You can also try restarting just mysql, not apach, to make the interruption slower and keep (some) requests alive. You can also try varying the sleep time, could be anywhere from 60 to 3600.
The restart interval is a trade-off. Longer intervals will obviously cause less interruptions, but could cause requests to become unacceptably slow or even completely hanging. This is because in the event of a leak, requests will get progressively slower until the restart happens. Finding the optimal interval in production requires careful monitoring.
These restarts are merely a band-aid solution, should normally be done while fixing the underlying problem. In practice, some systems can get by in "cockroach mode" with this workaround for months if needed. However, if you choose to do this, you'll have conceded a margin of safety. If something else were to go wrong, you won't have this "get out of jail" card to play! Bottom line: Monitor performance and user feedback carefully and use your judgement as to where to allocate resources.
Monitoring performance

Simplest way to check performance is of course to open URL in browser as well as check network tab in devtools.
A more automated way is to run perf function (in dotfiles) continuously, e.g.:
while [ 1 ] ; do date; perf; sleep 300; done

You might also run multiple of these in different windows. In case one shows a very long delay, you can check if the other did too at the same time. If it did, it means the whole process was hanging at that time. If not, it might be only an infrequent issue (that's assuming you don't see many of those long requests in history).