Skip to content

Instantly share code, notes, and snippets.

@9468305
Last active December 26, 2015 20:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save 9468305/9042d6eb137e5ea7dcbd to your computer and use it in GitHub Desktop.
Save 9468305/9042d6eb137e5ea7dcbd to your computer and use it in GitHub Desktop.
Nexus5奇葩,总是不按常理出牌;在各种情况下,md5p表现都优秀;并行数=4在各种情况下表现稳定;华为荣耀3X应该是8核手机,所以并行数=8时总是最优;目前主流手机是4核为主,因此选择并行数=4的md5算法最好; 后续测试:进行大文件的分段mmap+并行算法测试; 存在争议:究竟瓶颈是cpu还是file io?
Android手机测试数据
测试文件:460MB
源码:见 https://gist.github.com/9468305/97dca7c470ee02a6867c
使用场景:需求源于这里 https://gist.github.com/9468305/fa8f1307ea4738225fca
测试思路:测试mmap,file io buffer read,OpenMP并行数,对各种数据摘要算法,在不同手机上的性能表现进行统计分析
首先使用mmap映射整个文件到内存,直接使用460MB内存地址计算
blake2sp 和 blake2bp 并行数=官网默认值8
华为荣耀3X ndk-build thumb2 gcc -O3 执行2次
blake2s time = 15.230977 seconds 15.149100 seconds
blake2b time = 23.496808 seconds 23.273819 seconds
blake2sp time = 3.458435 seconds 3.083542 seconds
blake2bp time = 6.946069 seconds 7.100667 seconds
华为荣耀3X ndk-build thumb2 gcc -O2
run test on device
blake2s time = 15.519730 seconds
blake2b time = 22.998304 seconds
blake2sp time = 3.352492 seconds
blake2bp time = 6.810990 seconds
华为荣耀3X ndk-build thumb2 gcc -Os
blake2s time = 15.541925 seconds
blake2b time = 22.988108 seconds
blake2sp time = 3.389670 seconds
blake2bp time = 6.830233 seconds
华为荣耀3X ndk-build arm gcc -O3
blake2s time = 15.135739 seconds
blake2b time = 23.559449 seconds
blake2sp time = 3.393177 seconds
blake2bp time = 6.834560 seconds
数据分析:thumb arm相同,-O2 -O3 -O3 相同;
继续测试:以下使用thumb -O3
小米4 -O3 thumb 均衡模式(运行期间可能手机黑屏了,导致blake2bp数据不准)
blake2s time = 11.874335 seconds
blake2b time = 31.524835 seconds
blake2sp time = 3.485516 seconds
blake2bp time = 20.172217 seconds
小米4 省电模式(CPU被降频)
blake2s time = 21.074071 seconds
blake2b time = 48.200784 seconds
blake2sp time = 9.167828 seconds
blake2bp time = 23.293334 seconds
小米4 性能模式(CPU频率不锁,可动态至最高,但不是始终最高频运行)
blake2s time = 6.652548 seconds
blake2b time = 20.712190 seconds
blake2sp time = 2.550314 seconds
blake2bp time = 6.108460 seconds
Nexus5 thumb2 16指令集 执行2次 (Nexus5比较奇葩,32位CPU+64位OS,表现各种异常)
blake2s time = 6.078702 seconds 6.857189 seconds
blake2b time = 19.730120 seconds 19.689735 seconds
blake2sp time = 8.525794 seconds 10.762844 seconds
blake2bp time = 21.908520 seconds 23.456138 seconds
Nexus5 arm 32位指令集
blake2s time = 6.817943 seconds
blake2b time = 20.754325 seconds
blake2sp time = 10.346421 seconds
blake2bp time = 23.718929 seconds
数据分析:并行数可能对执行效率有影响
继续测试:修改blake2sp blake2bp的并行数=4
Nexus5 执行2次 (可能中途手机进入省电模式)
blake2s time = 10.197930 seconds 10.092280 seconds
blake2b time = 15.541306 seconds 15.102294 seconds
blake2sp time = 3.458573 seconds 3.345040 seconds
blake2bp time = 4.747609 seconds 4.683213 seconds
小米4 执行2次
blake2s time = 6.572716 seconds 6.555986 seconds
blake2b time = 20.772833 seconds 20.788557 seconds
blake2sp time = 2.553117 seconds 2.616197 seconds
blake2bp time = 6.034184 seconds 6.170549 seconds
Nexus5 执行2次 (手机亮屏)
blake2s time = 5.844344 seconds 5.960309 seconds
blake2b time = 19.377594 seconds 19.664046 seconds
blake2sp time = 7.031088 seconds 7.153667 seconds
blake2bp time = 21.665652 seconds 21.794273 seconds
数据分析:blake2b blake2bp 在32位OS上性能太差; 并行数对测试结果有影响;
继续测试:放弃blake2b,blake2bp,观察blake2s blake2sp在并行数=2的表现;
小米4
blake2s time = 6.574686 seconds
blake2sp time = 3.746227 seconds
华为荣耀3X
blake2s time = 10.452700 seconds
blake2sp time = 6.534018 seconds
Nexus5
blake2s time = 6.045295 seconds
blake2sp time = 6.597525 seconds
测试分析:对于小米4 华为荣耀3X,并行数2,4,8的差异不大;对于Nexus5,并行数越小,性能越高,但弱于其他手机(可是这款手机的CPU不算差)
继续测试:改用c file io读取文件,buffer = 8KB;不使用mmap;
华为荣耀3X 执行2次
blake2s time = 10.288910 seconds 10.249259 seconds
blake2sp time = 4.657944 seconds 4.876143 seconds
Nexus5 执行2次
blake2s time = 5.689329 seconds 6.569514 seconds
blake2sp time = 38.843955 seconds 36.973709 seconds
小米4
blake2s time = 7.024298 seconds 6.848578 seconds
blake2sp time = 5.210849 seconds 5.023147 seconds
小米4 io buffer = 16KB 结果跟8KB没差别
blake2s time = 6.938758 seconds 6.939108 seconds
blake2sp time = 11.459279 seconds 22.057534 seconds 手机中途黑屏 CPU降频
小米4 io buffer = 1Byte blake2sp代码中没有对入参size做判断,对1Byte数据做并行计算发生异常
blake2s time = 24.402502 seconds
blake2sp fail - 死循环
小米4 io buffer = 32KB
blake2s time = 6.626112 seconds 6.779171 seconds 跟8KB没差别
blake2sp time = 4.760458 seconds 4.642409 seconds 差距出现了
数据分析:io buffer大小对串行计算影响不大;对并行计算影响很大;
阅读blake2sp源码得知,它对每次update的buffer做并行计算,因此buffer越小,创建线程的开销越高,导致运行更慢。
而mmap整个文件进行read,规避了这个代码缺陷
继续测试:使用新的并行方案,将整个文件按并行数切分,每段数据使用file io buffer读写
blake2sp_file 源码见:https://gist.github.com/9468305/20068a1af16910361278
小米4 执行4次
blake2s time = 6.881021 seconds 6.881021 seconds 6.890889 seconds 6.816782 seconds
blake2sp time = 4.752358 seconds 4.752358 seconds 6.570885 seconds 4.864570 seconds
blake2sp_file time = 2.652560 seconds 2.652560 seconds 2.652320 seconds 2.791247 seconds
增加md5标准方案(串行)
md5 time = 4.476925 seconds time = 4.487960 seconds
华为荣耀3X 执行2次
blake2s time = 11.003261 seconds 11.111040 seconds
blake2sp time = 7.405163 seconds 6.993986 seconds
blake2sp_file time = 3.407042 seconds 4.195505 seconds
md5 time = 3.657673 seconds 3.724206 seconds
Nexus5
blake2s time = 5.720441 seconds
blake2sp time = 30.024116 seconds
blake2sp_file time = 7.610792 seconds
md5 time = 3.173777 seconds
数据分析:md5串行优于blake2各种变种方案;blake2sp_file优于blake2sp
继续测试:添加md5标准串行和md5文件并行方案,观察性能; 首先md5并行数=8
增加MD5并行方案md5p 源码见 https://gist.github.com/9468305/97dca7c470ee02a6867c
华为荣耀3X
blake2s time = 10.355605 seconds 10.297682 seconds
blake2sp time = 4.699469 seconds 4.719653 seconds
blake2sp_file time = 3.587286 seconds 3.332196 seconds
md5 time = 3.927048 seconds 3.923544 seconds
md5p time = 1.160357 seconds 1.052565 seconds
Nexus5
blake2s time = 5.444591 seconds 5.516199 seconds
blake2sp time = 34.665423 seconds 36.844409 seconds
blake2sp_file time = 7.044548 seconds 7.199510 seconds
md5 time = 2.865387 seconds 3.132674 seconds
md5p time = 3.096923 seconds 3.492756 seconds
小米2S 均衡模式
blake2s time = 10.639981 seconds
blake2sp time = 11.367557 seconds
blake2sp_file time = 4.067023 seconds
md5 time = 7.117229 seconds
md5p time = 3.617916 seconds
小米2S 性能模式
blake2s time = 11.341095 seconds
blake2sp time = 7.885427 seconds
blake2sp_file time = 3.922051 seconds
md5 time = 7.437327 seconds
md5p time = 3.234458 seconds
华为荣耀3X
blake2s time = 11.012500 seconds
blake2sp time = 6.982329 seconds
blake2sp_file time = 3.445275 seconds
md5 time = 3.889710 seconds
md5p time = 1.171731 seconds
华为荣耀3X md5并行数=8
blake2s time = 10.212562 seconds
blake2sp time = 4.839898 seconds
blake2sp_file time = 3.537396 seconds
md5 time = 3.851837 seconds
md5p time = 0.819049 seconds
华为荣耀3X 所有算法的并行数=8
blake2s time = 10.210714 seconds
blake2sp time = 5.407414 seconds
blake2sp_file time = 1.791010 seconds
md5 time = 3.841578 seconds
md5p time = 0.606765 seconds
小米4 并行数=8
blake2s time = 7.481798 seconds 6.739367 seconds
blake2sp time = 10.937392 seconds 10.036826 seconds
blake2sp_file time = 2.343030 seconds 2.327568 seconds
md5 time = 4.670945 seconds 4.498684 seconds
md5p time = 1.817163 seconds 1.622922 seconds
小米4 并行数=4
blake2s time = 6.784404 seconds 6.810516 seconds
blake2sp time = 10.230556 seconds 9.958661 seconds
blake2sp_file time = 2.353576 seconds 2.342131 seconds
md5 time = 4.474723 seconds 4.493030 seconds
md5p time = 1.787866 seconds 1.813598 seconds
Nexus5 md5并行数=4
blake2s time = 5.591858 seconds 5.504601 seconds
blake2sp time = 20.066783 seconds time = 20.679386 seconds
blake2sp_file time = 7.147342 seconds 7.157790 seconds
md5 time = 2.969569 seconds 2.886641 seconds
md5p time = 3.274280 seconds 3.064426 seconds
Nexus5 md5并行数=8
blake2s time = 6.177822 seconds
blake2sp time = 19.658719 seconds
blake2sp_file time = 7.498131 seconds
md5 time = 3.202704 seconds
md5p time = 3.490773 seconds
数据分析:blake2sp_file优于blake2sp;md5p优于md5;file分段read方案时,并行数已经影响不大
继续测试:淘汰blake2sp
Nexus5
blake2s time = 5.575001 seconds
blake2sp_file time = 7.127042 seconds
md5 time = 2.874951 seconds
md5p time = 3.333800 seconds
华为荣耀3X md5p=4
blake2s time = 11.114438 seconds 10.220039 seconds
blake2sp_file time = 2.727083 seconds 1.895031 seconds
md5 time = 3.851460 seconds 3.850437 seconds
md5p time = 1.082791 seconds 0.995741 seconds
华为荣耀3X md5p=8
blake2s time = 10.171139 seconds 10.433916 seconds
blake2sp_file time = 1.989670 seconds 2.010275 seconds
md5 time = 3.873485 seconds 3.869215 seconds
md5p time = 0.618082 seconds 0.664903 seconds
小米4 md5p=8
blake2s time = 6.755866 seconds 6.765623 seconds
blake2sp_file time = 2.260821 seconds 2.245780 seconds
md5 time = 4.471778 seconds 4.477018 seconds
md5p time = 1.802037 seconds 1.611127 seconds
总结分析:
Nexus5奇葩,总是不按常理出牌;
在各种情况下,md5p表现都优秀;
并行数=4在各种情况下表现稳定;
华为荣耀3X应该是8核手机,所以并行数=8时总是最优;
目前主流手机是4核为主,因此选择并行数=4的md5算法最好;
后续测试:
进行大文件的分段mmap+并行算法进行测试
存在争议:
究竟瓶颈是cpu还是file io?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment