This task imlements the MIT Scene Parse challenge.
The underlying dataset is the ADE20k dataset. We obtain the dataset from Huggingface.
We use the smallest (b0) version of the SegFormer model proposed in SegFormer: Simple and Efficient Scene Parsing with Transformers.
The metric used to evaluate the model is the mean Intersection over Union (mIoU). We use the implementation from mmsegmentaion. Our model achieves an mIoU of 35.63
. The search grid used to find the (currently) best hyperparameters can be found here. Since this task is very sensitive to the choice of learning rate, we might be able to improve this.
We compare our performance against the pretrained model found here. The model achieves an mIoU of 36.12
. This can be tested with the following yaml:
task:
name: segmentation
output_dir_name: segmentation_reference
model:
use_pretrained_model: true
engine:
train: false
plot: false