Skip to content

Instantly share code, notes, and snippets.

@dormando
Last active August 29, 2015 14:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dormando/05689cdc5709881b382a to your computer and use it in GitHub Desktop.
Save dormando/05689cdc5709881b382a to your computer and use it in GitHub Desktop.
varnish v4.0.0
TL:DR: increasing thread_pools slows down rates partly due to WRK_SumStat lockwaits. Otherwise, locks around vsl_get() and vfprintf into the VSL take up between 30-50% of overall time.
Machine 2x8core 2.9ghz intel cpu.
Test:
Client threads: 200
Reqs per conn: 1000
thread_pools=1: 390,619r/s
thread_pools=2: 374,610r/s
thread_pools=8: 349,200r/s
thread_pools=16: 343,486r/s
102 __lll_lock_wait,_L_lock_858,pthread_mutex_lock,Lck__Lock,WRK_SumStat,http1_cleanup,HTTP1_Session,ses_req_pool_task,SES_pool_accept_task
102 threads piling on WRK_SumStat lock.
thread_pools=16, thread_stats_rate=5000: 343k again
136 __lll_lock_wait,_L_lock_858,pthread_mutex_lock,Lck__Lock,WRK_SumStat,http1_cleanup,HTTP1_Session,ses_req_pool_task
,SES_pool_accept_task
thread_stats_rate=50000: still 343k
17 __lll_lock_wait,_L_lock_858,pthread_mutex_lock,Lck__Lock,WRK_SumStat,http1_cleanup,HTTP1_Session,ses_req_pool_task,SES_pool_accept_task
Fewer threads piling up.
Perf still shows 20% CPU in _raw_spin_lock. Might be VSL locks, lets dig in a bit.
thread_pools=1,thread_stats_rate=50000: 380,000r/s
same with thread_stats_rate set to default.
thread_pools=1,vsl_mask=-VCL_trace,-WorkThread,-Hash,-ReqHeader,-RespHeader,-VCL_call,-VCL_return,-Timestamp,-Debug: 422k-437k.
Lots of time spent in _raw_spin_lock now. (27%). Less time but still significant time spent in vprintf (13%)
threads are sitting in:
25 __lll_lock_wait,_L_lock_858,pthread_mutex_lock,vsl_get,VSL_Flush,http1_cleanup,HTTP1_Session,ses_req_pool_task,SES_pool_accept_task
5 __lll_lock_wait,_L_lock_858,pthread_mutex_lock,Lck__Lock,WRK_SumStat,http1_cleanup,HTTP1_Session,ses_req_pool_task,SES_pool_accept_task
plus:
1 __lll_lock_wait,_L_lock_858,pthread_mutex_lock,vsl_get,vslr,VSL,http1_dissect,req=0x16f24f0),ses_req_pool_task,SES_pool_accept_task
at least 50 of these, with varying req= args waiting on the same lock in vsl_get.
vsl_get is:
/*--------------------------------------------------------------------
* Reserve bytes for a record, wrap if necessary
*/
... global lock around pulling bytes from the shmlog. So I doubt this'll go much faster unless there's a way to switch amount of writing into VSL.
Lets kill even more:
param.set vsl_mask -VCL_trace,-WorkThread,-Hash,-ReqHeader,-RespHeader,-VCL_call,-VCL_return,-Timestamp,-Debug,-Begin,-ReqStart,-ReqMethod,-ReqURL,-ReqProtocol,-Hit,-RespProtocol,-RespStatus,-RespResponse,-RespUnset,-ReqAcct,-End
497k, but there's CPU idle now. load generator is maxed.
Adding a second host isn't behaving now, will have to tune the test.
Next up... pass/miss/etc/maybe?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment