How to evaluate the Open-Vocabulary Segmentation results in Table 2? #5

Glupayy · 2024-01-05T07:11:35Z

Hi,

Thank you for sharing your impressive work!

I got confused about Table 2: How are the open vocabulary segmentation metrics calculated?
Also, could you please explain how Osprey outputs the mask to calculate these metrics?

Thanks for your help!

CircleRadon · 2024-01-05T07:50:44Z

Hi, @Glupapa
For open-vocabulary segmentation, all approaches employ ground truth boxes/masks as input to assess regional recognition capability. We leverage the semantic similarity as the matching measurement to calculate these metrics. We will release the codes for performance evaluation.
Actually, the current version of Osprey lacks the capability to generate output masks.

Glupayy · 2024-01-05T08:33:31Z

Thanks for your prompt response!
I noticed that the metrics used on Cityscapes and ADE20K-150 in Table2 are PQ, AP and mIoU, so I'm curious about how to calculate these metrics if Osprey may not output a mask. Could you please shed some light on this?
Thank you once again for your assistance.

CircleRadon · 2024-01-05T09:03:24Z

@Glupapa The groundtruth masks are used in calculating these metrics.

yeliudev · 2024-11-05T09:44:25Z

Hi @CircleRadon! I'm having the same question as @Glupayy. How can we calculate the PQ, AP, and mIoU values since Osprey cannot output masks? My guess is that for each sample, the PQ, AP, and mIoU can either be 1 (when the predicted label is correct) or 0 (when the predicted label is wrong), and these 'binary' scores are averaged across all samples to obtain the values in Table2. I was wondering whether this is correct. Thanks!

yeliudev · 2024-11-05T09:45:57Z

Hi @CircleRadon! I'm having the same question as @Glupayy. How can we calculate the PQ, AP, and mIoU values since Osprey cannot output masks? My guess is that for each sample, the PQ, AP, and mIoU can either be 1 (when the predicted label is correct) or 0 (when the predicted label is wrong), and these 'binary' scores are averaged across all samples to obtain the values in Table2. I was wondering whether this is correct. Thanks!

If this is the case, could you please share more insights about why leveraging such metrics rather than simply computing accuracies? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to evaluate the Open-Vocabulary Segmentation results in Table 2? #5

How to evaluate the Open-Vocabulary Segmentation results in Table 2? #5

Glupayy commented Jan 5, 2024

CircleRadon commented Jan 5, 2024

Glupayy commented Jan 5, 2024

CircleRadon commented Jan 5, 2024

yeliudev commented Nov 5, 2024

yeliudev commented Nov 5, 2024

How to evaluate the Open-Vocabulary Segmentation results in Table 2? #5

How to evaluate the Open-Vocabulary Segmentation results in Table 2? #5

Comments

Glupayy commented Jan 5, 2024

CircleRadon commented Jan 5, 2024

Glupayy commented Jan 5, 2024

CircleRadon commented Jan 5, 2024

yeliudev commented Nov 5, 2024

yeliudev commented Nov 5, 2024