-
-
Save dormando/05689cdc5709881b382a to your computer and use it in GitHub Desktop.
varnish v4.0.0
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
TL:DR: increasing thread_pools slows down rates partly due to WRK_SumStat lockwaits. Otherwise, locks around vsl_get() and vfprintf into the VSL take up between 30-50% of overall time. | |
Machine 2x8core 2.9ghz intel cpu. | |
Test: | |
Client threads: 200 | |
Reqs per conn: 1000 | |
thread_pools=1: 390,619r/s | |
thread_pools=2: 374,610r/s | |
thread_pools=8: 349,200r/s | |
thread_pools=16: 343,486r/s | |
102 __lll_lock_wait,_L_lock_858,pthread_mutex_lock,Lck__Lock,WRK_SumStat,http1_cleanup,HTTP1_Session,ses_req_pool_task,SES_pool_accept_task | |
102 threads piling on WRK_SumStat lock. | |
thread_pools=16, thread_stats_rate=5000: 343k again | |
136 __lll_lock_wait,_L_lock_858,pthread_mutex_lock,Lck__Lock,WRK_SumStat,http1_cleanup,HTTP1_Session,ses_req_pool_task | |
,SES_pool_accept_task | |
thread_stats_rate=50000: still 343k | |
17 __lll_lock_wait,_L_lock_858,pthread_mutex_lock,Lck__Lock,WRK_SumStat,http1_cleanup,HTTP1_Session,ses_req_pool_task,SES_pool_accept_task | |
Fewer threads piling up. | |
Perf still shows 20% CPU in _raw_spin_lock. Might be VSL locks, lets dig in a bit. | |
thread_pools=1,thread_stats_rate=50000: 380,000r/s | |
same with thread_stats_rate set to default. | |
thread_pools=1,vsl_mask=-VCL_trace,-WorkThread,-Hash,-ReqHeader,-RespHeader,-VCL_call,-VCL_return,-Timestamp,-Debug: 422k-437k. | |
Lots of time spent in _raw_spin_lock now. (27%). Less time but still significant time spent in vprintf (13%) | |
threads are sitting in: | |
25 __lll_lock_wait,_L_lock_858,pthread_mutex_lock,vsl_get,VSL_Flush,http1_cleanup,HTTP1_Session,ses_req_pool_task,SES_pool_accept_task | |
5 __lll_lock_wait,_L_lock_858,pthread_mutex_lock,Lck__Lock,WRK_SumStat,http1_cleanup,HTTP1_Session,ses_req_pool_task,SES_pool_accept_task | |
plus: | |
1 __lll_lock_wait,_L_lock_858,pthread_mutex_lock,vsl_get,vslr,VSL,http1_dissect,req=0x16f24f0),ses_req_pool_task,SES_pool_accept_task | |
at least 50 of these, with varying req= args waiting on the same lock in vsl_get. | |
vsl_get is: | |
/*-------------------------------------------------------------------- | |
* Reserve bytes for a record, wrap if necessary | |
*/ | |
... global lock around pulling bytes from the shmlog. So I doubt this'll go much faster unless there's a way to switch amount of writing into VSL. | |
Lets kill even more: | |
param.set vsl_mask -VCL_trace,-WorkThread,-Hash,-ReqHeader,-RespHeader,-VCL_call,-VCL_return,-Timestamp,-Debug,-Begin,-ReqStart,-ReqMethod,-ReqURL,-ReqProtocol,-Hit,-RespProtocol,-RespStatus,-RespResponse,-RespUnset,-ReqAcct,-End | |
497k, but there's CPU idle now. load generator is maxed. | |
Adding a second host isn't behaving now, will have to tune the test. | |
Next up... pass/miss/etc/maybe? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment