Skip to content

Latest commit

 

History

History
1663 lines (1642 loc) · 58.2 KB

benchmark.md

File metadata and controls

1663 lines (1642 loc) · 58.2 KB

Benchmark

Backends

CPU: ncnn, ONNXRuntime, OpenVINO

GPU: ncnn, TensorRT, PPLNN

软硬件环境

  • Ubuntu 18.04
  • ncnn 20211208
  • Cuda 11.3
  • TensorRT 7.2.3.4
  • Docker 20.10.8
  • NVIDIA tesla T4 tensor core GPU for TensorRT

配置

  • 静态图导出
  • batch 大小为 1
  • 测试时,计算各个数据集中 100 张图片的平均耗时

用户可以直接通过model profiling获得想要的速度测试结果。下面是我们环境中的测试结果:

速度测试

mmcls TensorRT(ms) PPLNN(ms) ncnn(ms)
model spatial T4 JetsonNano2GB Jetson TX2 T4 SnapDragon888 Adreno660
fp32 fp16 int8 fp32 fp16 fp32 fp16 fp32 fp32
ResNet 224x224 2.97 1.26 1.21 59.32 30.54 24.13 1.30 33.91 25.93
ResNeXt 224x224 4.31 1.42 1.37 88.10 49.18 37.45 1.36 133.44 69.38
SE-ResNet 224x224 3.41 1.66 1.51 74.59 48.78 29.62 1.91 107.84 80.85
ShuffleNetV2 224x224 1.37 1.19 1.13 15.26 10.23 7.37 4.69 9.55 10.66
mmdet part1 TensorRT(ms) PPLNN(ms)
model spatial T4 Jetson TX2 T4
fp32 fp16 int8 fp32 fp16
YOLOv3 320x320 14.76 24.92 24.92 - 18.07
SSD-Lite 320x320 8.84 9.21 8.04 1.28 19.72
RetinaNet 800x1344 97.09 25.79 16.88 780.48 38.34
FCOS 800x1344 84.06 23.15 17.68 - -
FSAF 800x1344 82.96 21.02 13.50 - 30.41
Faster R-CNN 800x1344 88.08 26.52 19.14 733.81 65.40
Mask R-CNN 800x1344 104.83 58.27 - - 86.80
mmdet part2 ncnn
model spatial SnapDragon888 Adreno660
fp32 fp32
MobileNetv2-YOLOv3 320x320 48.57 66.55
SSD-Lite 320x320 44.91 66.19
YOLOX 416x416 111.60 134.50
mmedit TensorRT(ms) PPLNN(ms)
model spatial T4 Jetson TX2 T4
fp32 fp16 int8 fp32 fp16
ESRGAN 32x32 12.64 12.42 12.45 - 7.67
SRCNN 32x32 0.70 0.35 0.26 58.86 0.56
mmocr TensorRT(ms) PPLNN(ms) ncnn(ms)
model spatial T4 T4 SnapDragon888 Adreno660
fp32 fp16 int8 fp16 fp32 fp32
DBNet 640x640 10.70 5.62 5.00 34.84 - -
CRNN 32x32 1.93 1.40 1.36 - 10.57 20.00
mmseg TensorRT(ms) PPLNN(ms)
model spatial T4 Jetson TX2 T4
fp32 fp16 int8 fp32 fp16
FCN 512x1024 128.42 23.97 18.13 1682.54 27.00
PSPNet 1x3x512x1024 119.77 24.10 16.33 1586.19 27.26
DeepLabV3 512x1024 226.75 31.80 19.85 - 36.01
DeepLabV3+ 512x1024 151.25 47.03 50.38 2534.96 34.80

精度测试

mmcls PyTorch TorchScript ONNX Runtime TensorRT PPLNN
model metric fp32 fp32 fp32 fp32 fp16 int8 fp16
ResNet-18 top-1 69.90 69.90 69.88 69.88 69.86 69.86 69.86
top-5 89.43 89.43 89.34 89.34 89.33 89.38 89.34
ResNeXt-50 top-1 77.90 77.90 77.90 77.90 - 77.78 77.89
top-5 93.66 93.66 93.66 93.66 - 93.64 93.65
SE-ResNet-50 top-1 77.74 77.74 77.74 77.74 77.75 77.63 77.73
top-5 93.84 93.84 93.84 93.84 93.83 93.72 93.84
ShuffleNetV1 1.0x top-1 68.13 68.13 68.13 68.13 68.13 67.71 68.11
top-5 87.81 87.81 87.81 87.81 87.81 87.58 87.80
ShuffleNetV2 1.0x top-1 69.55 69.55 69.55 69.55 69.54 69.10 69.54
top-5 88.92 88.92 88.92 88.92 88.91 88.58 88.92
MobileNet V2 top-1 71.86 71.86 71.86 71.86 71.87 70.91 71.84
top-5 90.42 90.42 90.42 90.42 90.40 89.85 90.41
Vision Transformer top-1 85.43 85.43 - 85.43 85.42 - -
top-5 97.77 97.77 - 97.77 97.76 - -
mmdet Pytorch TorchScript ONNXRuntime TensorRT PPLNN
model task dataset metric fp32 fp32 fp32 fp32 fp16 int8 fp16
YOLOV3 Object Detection COCO2017 box AP 33.7 33.7 - 33.5 33.5 33.5 -
SSD Object Detection COCO2017 box AP 25.5 25.5 - 25.5 25.5 - -
RetinaNet Object Detection COCO2017 box AP 36.5 36.4 - 36.4 36.4 36.3 36.5
FCOS Object Detection COCO2017 box AP 36.6 - - 36.6 36.5 - -
FSAF Object Detection COCO2017 box AP 37.4 37.4 - 37.4 37.4 37.2 37.4
YOLOX Object Detection COCO2017 box AP 40.5 40.3 - 40.3 40.3 29.3 -
Faster R-CNN Object Detection COCO2017 box AP 37.4 37.3 - 37.3 37.3 37.1 37.3
ATSS Object Detection COCO2017 box AP 39.4 - - 39.4 39.4 - -
Cascade R-CNN Object Detection COCO2017 box AP 40.4 - - 40.4 40.4 - 40.4
GFL Object Detection COCO2017 box AP 40.2 - 40.2 40.2 40.0 - -
RepPoints Object Detection COCO2017 box AP 37.0 - - 36.9 - - -
Mask R-CNN Instance Segmentation COCO2017 box AP 38.2 38.1 - 38.1 38.1 - 38.0
mask AP 34.7 34.7 - 33.7 33.7 - -
Swin-Transformer Instance Segmentation COCO2017 box AP 42.7 - 42.7 42.5 37.7 - -
mask AP 39.3 - 39.3 39.3 35.4 - -
mmedit Pytorch TorchScript ONNX Runtime TensorRT PPLNN
model task dataset metric fp32 fp32 fp32 fp32 fp16 int8 fp16
SRCNN Super Resolution Set5 PSNR 28.4316 28.4120 28.4323 28.4323 28.4286 28.1995 28.4311
SSIM 0.8099 0.8106 0.8097 0.8097 0.8096 0.7934 0.8096
ESRGAN Super Resolution Set5 PSNR 28.2700 28.2619 28.2592 28.2592 - - 28.2624
SSIM 0.7778 0.7784 0.7764 0.7774 - - 0.7765
ESRGAN-PSNR Super Resolution Set5 PSNR 30.6428 30.6306 30.6444 30.6430 - - 27.0426
SSIM 0.8559 0.8565 0.8558 0.8558 - - 0.8557
SRGAN Super Resolution Set5 PSNR 27.9499 27.9252 27.9408 27.9408 - - 27.9388
SSIM 0.7846 0.7851 0.7839 0.7839 - - 0.7839
SRResNet Super Resolution Set5 PSNR 30.2252 30.2069 30.2300 30.2300 - - 30.2294
SSIM 0.8491 0.8497 0.8488 0.8488 - - 0.8488
Real-ESRNet Super Resolution Set5 PSNR 28.0297 - 27.7016 27.7016 - - 27.7049
SSIM 0.8236 - 0.8122 0.8122 - - 0.8123
EDSR Super Resolution Set5 PSNR 30.2223 30.2192 30.2214 30.2214 30.2211 30.1383 -
SSIM 0.8500 0.8507 0.8497 0.8497 0.8497 0.8469 -
mmocr Pytorch TorchScript ONNXRuntime TensorRT PPLNN OpenVINO
model task dataset metric fp32 fp32 fp32 fp32 fp16 int8 fp16 fp32
DBNet* TextDetection ICDAR2015 recall 0.7310 0.7308 0.7304 0.7198 0.7179 0.7111 0.7304 0.7309
precision 0.8714 0.8718 0.8714 0.8677 0.8674 0.8688 0.8718 0.8714
hmean 0.7950 0.7949 0.7950 0.7868 0.7856 0.7821 0.7949 0.7950
PSENet TextDetection ICDAR2015 recall 0.7526 0.7526 0.7526 0.7526 0.7520 0.7496 - 0.7526
precision 0.8669 0.8669 0.8669 0.8669 0.8668 0.8550 - 0.8669
hmean 0.8057 0.8057 0.8057 0.8057 0.8054 0.7989 - 0.8057
PANet TextDetection ICDAR2015 recall 0.7401 0.7401 0.7401 0.7357 0.7366 - - 0.7401
precision 0.8601 0.8601 0.8601 0.8570 0.8586 - - 0.8601
hmean 0.7955 0.7955 0.7955 0.7917 0.7930 - - 0.7955
CRNN TextRecognition IIIT5K acc 0.8067 0.8067 0.8067 0.8067 0.8063 0.8067 0.8067 -
SAR TextRecognition IIIT5K acc 0.9517 - 0.9287 - - - - -
SATRN TextRecognition IIIT5K acc 0.9470 0.9487 0.9487 0.9487 0.9483 0.9483 - -
mmseg Pytorch TorchScript ONNXRuntime TensorRT PPLNN
model dataset metric fp32 fp32 fp32 fp32 fp16 int8 fp16
FCN Cityscapes mIoU 72.25 72.36 - 72.36 72.35 74.19 72.35
PSPNet Cityscapes mIoU 78.55 78.66 - 78.26 78.24 77.97 78.09
deeplabv3 Cityscapes mIoU 79.09 79.12 - 79.12 79.12 78.96 79.12
deeplabv3+ Cityscapes mIoU 79.61 79.60 - 79.60 79.60 79.43 79.60
Fast-SCNN Cityscapes mIoU 70.96 70.96 - 70.93 70.92 66.00 70.92
UNet Cityscapes mIoU 69.10 - - 69.10 69.10 68.95 -
ANN Cityscapes mIoU 77.40 - - 77.32 77.32 - -
APCNet Cityscapes mIoU 77.40 - - 77.32 77.32 - -
BiSeNetV1 Cityscapes mIoU 74.44 - - 74.44 74.43 - -
BiSeNetV2 Cityscapes mIoU 73.21 - - 73.21 73.21 - -
CGNet Cityscapes mIoU 68.25 - - 68.27 68.27 - -
EMANet Cityscapes mIoU 77.59 - - 77.59 77.6 - -
EncNet Cityscapes mIoU 75.67 - - 75.66 75.66 - -
ERFNet Cityscapes mIoU 71.08 - - 71.08 71.07 - -
FastFCN Cityscapes mIoU 79.12 - - 79.12 79.12 - -
GCNet Cityscapes mIoU 77.69 - - 77.69 77.69 - -
ICNet Cityscapes mIoU 76.29 - - 76.36 76.36 - -
ISANet Cityscapes mIoU 78.49 - - 78.49 78.49 - -
OCRNet Cityscapes mIoU 74.30 - - 73.66 73.67 - -
PointRend Cityscapes mIoU 76.47 - - 76.41 76.42 - -
Semantic FPN Cityscapes mIoU 74.52 - - 74.52 74.52 - -
STDC Cityscapes mIoU 75.10 - - 75.10 75.10 - -
STDC Cityscapes mIoU 77.17 - - 77.17 77.17 - -
UPerNet Cityscapes mIoU 77.10 - - 77.19 77.18 - -
Segmenter ADE20K mIoU 44.32 44.29 44.29 44.29 43.34 43.35 -
mmpose Pytorch ONNXRuntime TensorRT PPLNN OpenVINO
model task dataset metric fp32 fp32 fp32 fp16 fp16 fp32
HRNet Pose Detection COCO AP 0.748 0.748 0.748 0.748 - 0.748
AR 0.802 0.802 0.802 0.802 - 0.802
LiteHRNet Pose Detection COCO AP 0.663 0.663 0.663 - - 0.663
AR 0.728 0.728 0.728 - - 0.728
MSPN Pose Detection COCO AP 0.762 0.762 0.762 0.762 - 0.762
AR 0.825 0.825 0.825 0.825 - 0.825
mmrotate Pytorch ONNXRuntime TensorRT PPLNN OpenVINO
model task dataset metrics fp32 fp32 fp32 fp16 fp16 fp32
RotatedRetinaNet Rotated Detection DOTA-v1.0 mAP 0.698 0.698 0.698 0.697 - -
Oriented RCNN Rotated Detection DOTA-v1.0 mAP 0.756 0.756 0.758 0.730 - -
GlidingVertex Rotated Detection DOTA-v1.0 mAP 0.732 - 0.733 0.731 - -

备注

  • 由于某些数据集在代码库中包含各种分辨率的图像,例如 MMDet,速度基准是通过 MMDeploy 中的静态配置获得的,而性能基准是通过动态配置获得的
  • TensorRT 的一些 int8 性能基准测试需要有 tensor core 的 Nvidia 卡,否则性能会大幅下降
  • DBNet 在模型 neck 使用了nearest 插值,TensorRT-7 用了与 Pytorch 完全不同的策略。为了使与 TensorRT-7 兼容,我们重写了neck以使用bilinear插值,这提高了检测性能。为了获得与 Pytorch 匹配的性能,推荐使用 TensorRT-8+,其插值方法与 Pytorch 相同。
  • 对于 mmpose 模型,在模型配置文件中 flip_test 需设置为 False
  • 部分模型在 fp16 模式下可能存在较大的精度损失,请根据具体情况对模型进行调整。