Fast-MobileNetV2

Optimized CUDA Kernels for Fast MobileNetV2 Inference

Develop Steps

① Implement MobileNetV2 with PyTorch, and parse the given ONNX model with Python to analyze the network structure. --- mobilenet_v2/nn/onnx/
② Implement MobileNetV2 with C++ (only sequential layer structures and weights, no forward computation), and parse the given ONNX model with Python to extract the weights. --- mobilenet_v2/nn/
③ Implement wrappers and tests for cuDNN/cuBLAS primitives: Conv, Gemm, and Pool. --- mobilenet_v2/cudnn/
- Here, Gemm can be implemented using cuBLAS, or seen as 1x1 Conv2d using cuDNN, we take the former way)
④ Implement cuDNN-accelerated MobileNetV2 with wrappers and C++ network implemented above. --- mobilenet_v2/cudnn/
⑤ Implement and optimize CUDA kernels: Conv, Gemm, and Pool. --- mobilenet_v2/fast_mobilenet/
- Here, Conv can be implemented using Im2Col + Gemm, or Winograd Algorithm (we only implemented the former)
⑥ Implement our Fast-MobileNetV2 as a whole. --- mobilenet_v2/fast_mobilenet/
⑦ Compare and Optimize: e.g. parameters tuning, model-specific / hardware-specific optimization, ...

Test Steps

nn

Re-implement MobileNetV2 ONNX model with PyTorch and test inference:

(conda) >> cd mobilenet_v2/nn/onnx/
(conda) >> python pytorchMobileNetV2.py

Save weights in MobileNetV2 ONNX model to plain-text files:

(conda) >> cd mobilenet_v2/nn/weights/
(conda) >> python save_weights.py

Show MobileNetV2 topology in C++ and check loaded weights:

>> cd mobilenet_v2/nn/examples/
>> make show
>> ./show.out
>> make check
>> ./check.out

cudnn

Show version of CUDA and CUDNN:

>> cd mobilenet_v2/cudnn/
>> bash version.sh

Operator tests:

>> cd mobilenet_v2/cudnn/tests/test_op/
>> make
>> ./testConv.o
>> ./testGemm.o
>> ./testPool.o
>> ./testAdd.o

Network test:

(conda) >> cd mobilenet_v2/cudnn/tests/test_net/
(conda) >> python generate_data.py
(conda) >> conda deactivate
>> make
>> ./testCudnnMobileNetV2.o
>> source ~/.bashrc
(conda) >> python compare_cudnn_onnx.py

our kernels

Operator tests:

>> cd mobilenet_v2/fast_mobilenet/tests/test_op/
>> make
>> ./testConv.o
>> ./testGemm.o
>> ./testPool.o
>> ./testAdd.o
>> ./testIm2Col.o

Network test:

(conda) >> cd mobilenet_v2/fast_mobilenet/tests/test_net/
(conda) >> python generate_data.py
(conda) >> conda deactivate
>> make
>> ./testFastMobileNetV2.o
>> source ~/.bashrc
(conda) >> python compare_fast_onnx.py

Test Environment

NVIDIA Tesla V100 GPU
CUDA version 10.2.89
CUDNN version 8.2.4
Run Python source of this repo in an Anaconda environment, and we have Python version 3.9.7
Do NOT Run CUDA source of this repo in an Anaconda environment

Tech Stack

MobileNetV2: Inverted Residuals and Linear Bottlenecks
ONNX Python API
cuDNN and cuBLAS API
CUDA C++ Programming
GPU Architecture and Compiler Optimization

Reference

[1] Sandler, Mark, et al. "Mobilenetv2: Inverted residuals and linear bottlenecks." Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 2018.

[2] NVIDIA Corporation. "NVIDIA cuDNN Documentation." available at: https://docs.nvidia.com/deeplearning/cudnn/api/index.html

[3] NVIDIA Corporation. "NVIDIA cuBLAS Documentation." available at: https://docs.nvidia.com/cuda/cublas/index.html

[4] Lavin, Andrew, and Scott Gray. "Fast algorithms for convolutional neural networks." Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 2016.

[5] Mark Harris. "CUDA Pro Tip: Write Flexible Kernels with Grid-Stride Loops." available at: https://developer.nvidia.com/blog/cuda-pro-tip-write-flexible-kernels-grid-stride-loops/

[6] Mark Harris. "Optimizing Parallel Reduction in CUDA." available at: https://vuduc.org/teaching/cse6230-hpcta-fa12/slides/cse6230-fa12--05b-reduction-notes.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
mobilenet_v2		mobilenet_v2
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
slides.pdf		slides.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fast-MobileNetV2

Develop Steps

Test Steps

nn

cudnn

our kernels

Test Environment

Tech Stack

Reference

About

Releases

Packages

Contributors 2

Languages

License

zhliuworks/Fast-MobileNetV2

Folders and files

Latest commit

History

Repository files navigation

Fast-MobileNetV2

Develop Steps

Test Steps

nn

cudnn

our kernels

Test Environment

Tech Stack

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages