Skip to content

ESPNet is an efficient scalable scene segmentation model which uses Pyramid Pooling Module (PPM) as a global contextual prior for feature extraction.

Notifications You must be signed in to change notification settings

CoolLoveBoy/ESPNet

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 

Repository files navigation

Efficient Segmentation Pyramid Network

Semantic or Pixel-wise Segmentation is the process of automatically applying a class or label, designated by a dataset, to each pixel in an image.These labels or classes could include people, car, flower, building, furniture, and etc. To efficiently apply a class or label to each pixel of an image, we introduce an efficient and scalable network, call "Efficient Segmentation Pyramid Network (ESPNet)". By exploiting the scalable feature of EfficientNet models, we design Base ESPNet S0, S1 and S2. We noticed that due to scaling ESPNet from S0 to S2, validation mIoU has increased by 4.4%. But scaling process also increases computational cost due to increase of number of parameters and FLOPS. That is why; instead of scaling up further, we introduce final ESPNet. Model Predictions are uploaded in this reposiotory. Details will be available upon acceptance of research paper.

Datasets

For this research work, we have used cityscapes benchmark datasets.

Metrics

To understand the metrics used for model performance evaluation, please refer here: https://www.cityscapes-dataset.com/benchmarks/#pixel-level-results

Transfer Learning

For performance comparison, we trained few off-line and real-time existing models under same configuration and compared their performance with ESPNet. Some existing models require the use of ImageNet pretrained models to initialize their weights. Details will be given soon.

Requirements for Project

  • TensorFlow 2.1
    • This requires CUDA >= 10.1
    • TensorRT will require an NVIDIA GPU with Tensor Cores.
    • Horovod framework (for effective utilization of resources and speed up GPUs)
  • Keras 2.3.1
  • Python >= 3.7

Results

Size of image in cityscapes dataset is 1024x2048px. But ESPNets accept 512x512px input. Therefore, all models are trained with 512x512px size of input and predicted images are of same size.

Separable UNet

Separable UNet

DeepLabV3+

DeepLabV3+

Bayesian SegNet

Bayesian SegNet

FAST-SCNN

FAST-SCNN

Base ESPNet S0

Base ESPNet

Final ESPNet

Final ESPNet

About

ESPNet is an efficient scalable scene segmentation model which uses Pyramid Pooling Module (PPM) as a global contextual prior for feature extraction.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%