Skip to content

Instantly share code, notes, and snippets.

@hewumars
Last active March 17, 2020 08:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hewumars/745173f937db37582b28edd07ec12ca7 to your computer and use it in GitHub Desktop.
Save hewumars/745173f937db37582b28edd07ec12ca7 to your computer and use it in GitHub Desktop.
网络 耗时(batch=1)(3.4GB) 耗时(Batch=4)
ObjectDetectStageCentG320x384 18.99000
ObjectDetectStageV160x128 4.001000
VehiclePlateSegmentCH 0.879000
VehicleDriverGeneral 3.446000
VehiclePlateNo 2.013000
VehiclePlateAlignCH 1.655000
VehiclePlateNameCH 1.576000
VehiclePlateExceptionPlate 1.012000
VehiclePlateExceptionHead 0.895000
VehicleLabel 2.778000
VehicleColor 1.679000
VehicleType 1.083000
ObjectDetectStageT160x96 1.498000
Graph整体运行时间 60ms
像素 类型 NPU 进程数(线程数) Batch ARM使用率(%) NPU使用率(%) NPU显存(M) 算法时间(ms) 解码时间(ms) 每天处理图片数量(张) cpu使用率(%) 内存使用(M)
200万 1-3车 ascend310 1(host:18,device:46) 4 22 75(+-20) 1656 20.388*2 3.649 4237787/2 250 1585
2 22*2 80 1761*2 23.026*2 4.384 7504560/2 225*2 1602+1341
跑到1000多张崩了 3 22*3 115 1545*3 29.831*2 6.336 8688947/2 220*3 1484*3
上面的统计有误时间统一要乘以2
20190816 main中sendData改为200张循环一次 1 4 35-45 130 1000(参考,内存会上升) 23.736 3.649 3640040 60 517
2 4 33*2 1000*2 35.256 5.512 4901293 522
3 4 28*3 1000 52.755 6.411 4913278
20190817 发4张返回4张结果 8 (5-10)*8 40-120 246*8 135.867 3.6 5087328 (5-10)*8 513*8
20190820 不返回decodeBuf 8 103.608 6671299
20190823 8k内存池 5 4 (5-10)*8 639 64.446 6703286
1 4 27.972 3088803
8 4 246 93.261 7411458
1 8 26.217 3295571
4k内存池、内存管理 7 8 433 78.433 7711040 531
fp16 5 16 44.115 9792657 519
int8 6 16 574 43.941 11797571 519
int8 6 16 559.8 42.667 12149816 527
500万
700万 5车 int8 6 16 569 47.719 10863617 541
模块(单进程) batch=4耗时 Batch=8耗时 Batch=16耗时
DvppJpegDecode 2.878*4 2.873*8 2.859
ObjectDetectStage1_v3_Input 0.028 0.036 0.036
ObjectDetectStage1_v3_PreProcess 6.018 11.945 11.985
ObjectDetectStage1_v3_Predict 24.847 48.648 97.002
ObjectDetectStage1_v3_GetLayer 0.743 0.77 0.771
ObjectDetectStage1_v3_PostProcess 0.664*4 0.661*8 0.661
ObjectDetectStage1_v3_Output 0.011*4 0.011*8 0.01
VehicleDetectStage2_Input 0.028 0.034 0.034
VehicleDetectStage2_Index 0.005 0.007 0.009
VehicleDetectStage2_PreProcess 0.842 1.563 2.643
VehicleDetectStage2_Predict 4.516 7.334 13.754
VehicleDetectStage2_GetLayer 0.206 0.201 0.188
VehicleDetectStage2_PostProcess 0.172*4 0.163*8 0.159
VehicleDetectStage2_Output 0.003*4 0.003*8 0.021
VehicleDetectStage2_Segment_Index 0.005 0.009 0.01
VehicleDetectStage2_Segment_PreProcess 0.354 0.578 0.544
VehicleDetectStage2_Segment_Predict 0.784 0.947 1.02
VehicleDetectStage2_Segment_Output 0.002 0.002 0.002
Vehicle_Input 0.023 0.027 0.027
Vehicle_Index 0.002 0.005 0.006
Vehicle_PreProcess 0.759 1.526 2.741
Vehicle_Predict 1.555 2.247 3.327
Vehicle_Classification 0.011*4 0.02*8 0.021
Vehicle_ExtractFeature 0.008*4 0.016*8 0.018
Vehicle_FeatureCode 0.487 0.996 1.084
VehicleDriver_Input 0.025 0.031 0.028
VehicleDriver_Index 0.006 0.01 0.012
VehicleDriver_PreProcess 0.289 0.354 0.323
VehicleDriver_Predict 4.38 6.736 11.514
VehicleDriver_ArgMax 0.001*4 0.001*8 0.001
VehicleDriver_Output 0.001*4 0.001*8 0.001
VehiclePlate_Input 0.024 0.03 0.027
VehiclePlateAlign_Index 0.063 0.066 0.07
VehiclePlateAlign_PreProcess 0.644 0.646 0.934
VehiclePlateAlign_Predict 1.614 2.774 6.691
VehiclePlateAlign_Output 0.001*4 0.001*8 0.001
VehiclePlate_ProcessPlate 9.818 16.827 17.364
VehiclePlate_Index 5.442 10.142 10.578
VehiclePlate_CropToMat 0.186 0.152 0.154
VehiclePlate_ImageRotate 1.21 1.178 1.251
VehiclePlate_PreProcess 0.05 0.098 0.163
VehiclePlate_Predict 1.461 2.154 3.594
VehiclePlate_Output 0.094 0.172 0.279
VehiclePlate_PostProcess 0.054 0.098 0.1
VehicleSpecial_Input 0.019 0.019 0.018
VehicleSpecial_Index 0.018 0.026 0.027
VehicleSpecial_PreProcess 1.127 2.012 2.71
VehicleSpecial_Predict 2.053 3.078 5.366
VehicleSpecial_GetLayer 0.022 0.02 0.019
VehicleSpecial_PostProcess 0.039*4 0.035*8 0.035
VehicleSpecial_Output 0.001*4 0.001*8 0.001
其他(框架消耗?) 111.888-76.774=35.114
像素 类型 NPU 进程数(线程数) Batch ARM使用率(%) NPU使用率(%) NPU显存(M) 时间(ms) 每天处理图片数量(张) cpu使用率(%) 内存使用(M)
200W H264 1 1(6) 4 46.9279 11046733
1(5) 8 35.53565 12156806
1(4) 16 22.9951 15029288
JPEG 1(8) 16 125 5529600
H264 1(8) 16 175 3949714
20191226
200W JPEG 4 1(9) 卡口目标少速度比视频目标多快50%,1.93个目标 8.342ms 改host为DMalloc申请buf后貌似有提速 单核9.162 37721972
500w 1(7) 6.44个目标 17.235 20051842
700w 1(5) 7.43个目标 20.672 16718322
200W H264 4 1(9) 包括回传编码跟踪等 22.086 15648197
1(9) 去掉回传编码跟踪 比上面速度快20% 17.918 19287894
1(9) 目标少 比目标多视频快35% 14.25 24251220
像素 类型 NPU 进程数(线程数) Batch AI-CPU使用率(%) ctrl-CPU使用率(%) NPU使用率(%) NPU显存(M) 带宽(%) 时间(ms) 每天处理图片数量(张) cpu使用率(%) 内存使用(M)
200W JPEG 1 1(8) 16 30-45 38-50 50 68 40-60 50.61 30 26000
JPEG 1 1(7) 16 59 45.17 22000
JPEG 1 1(6) 16 25-35 30-90 50 56 25-60 39.86
JPEG 1 1(5) 16 32-36 25-35 50 56 30-50 29.56
JPEG 1 1(3) 16 15-20 12-15 36 53 20-40 18.38
H264 1(1) 41.75 不编码40.93/AiToMat删除后37.50
H264 1 1(3) 16 15-40 3-40 20-36 56 5-54 47.47 150-250
H264 1 1(6) 16 73.64
H264 1 1(8) 16 15-30 89-95 48-50 76 50-60 147.22 不编码135.21/AiToMat删除后139.59 170-250 2500

原始预处理

  1. PreProcess_Avg:Avg. of 3116 loops: 0.650000 ms
  2. PreProcess_Normal_:Avg. of 4357 loops: 0.185000 ms
  3. PreProcess_PlateRot_:Avg. of 79 loops: 2.083000 ms
  4. PreProcess_Plate1v1_:Avg. of 157 loops: 0.581000 ms
  5. PreProcess_Plate1v2_:Avg. of 1081 loops: 0.865000 ms

Dvpp预处理

  1. PreProcess_Avg:Avg. of 3116 loops: 0.481000 ms
  2. PreProcess_Normal_:Avg. of 4357 loops: 0.179000 ms
  3. PreProcess_PlateRot_:Avg. of 79 loops: 2.120000 ms
  4. PreProcess_Plate1v1_1v2_:Avg. of 1238 loops: 0.427000 ms
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment