Support for reranking tasks with multi-level relevance scores? #1240

omihub777 · 2024-09-26T08:26:51Z

omihub777
Sep 26, 2024

Thank you for all your hard work.

I am trying to use a custom reranking dataset with multi-level relevance scores (e.g., relevant, less relevant, non-relevant). However, it seems AbsTaskReranking only supports binary relevance scores (positive/negative). Is there any existing support for reranking datasets with multi-level relevance scores in MTEB?

One workaround might be to convert multi-level relevance scores into binary ones (e.g., relevant and less relevant->positive, non-relevant->negative), as mentioned in #818 (comment). While this approach is simple and effective, it is not ideal because it discards the granularity of the original relevance scores. Thank you in advance!

Answered by orionw

Sep 28, 2024

Short answer: we don't currently support that for reranking but do support it for retrieval due to the trec_eval evaluation library.

Long answer: Our current reranking setting is fairly distinct from the retrieval evaluation even though there are lots of overlap.
Right now it has the two buckets (positives and negatives) and then computes metrics based on those. But we really could compute it in the same way as retrieval by having qrels and using trec_eval (just basically a bunch of small retrieval tasks). We would need to provide a wrapper for backwards compatibility to make the previous tasks converted into this style, but I think it would be a lot more intuitive to use the future (as …

View full answer

KennethEnevoldsen · 2024-09-28T16:28:35Z

KennethEnevoldsen
Sep 28, 2024
Maintainer

Not as far as I know. I am unsure if the code would support continuous 0-1 (I think that it might) scores. However, otherwise, a solution is as you say create a binary mapping and potentially two versions of the dataset (one with only relevant and one with both). However, it might be better to simply create a PR to update the reranking task to allow for multi-level relevance scores.

@orionw knows more about this area than me so maybe he has a better idea.

2 replies

orionw Sep 28, 2024
Maintainer

Short answer: we don't currently support that for reranking but do support it for retrieval due to the trec_eval evaluation library.

Long answer: Our current reranking setting is fairly distinct from the retrieval evaluation even though there are lots of overlap.
Right now it has the two buckets (positives and negatives) and then computes metrics based on those. But we really could compute it in the same way as retrieval by having qrels and using trec_eval (just basically a bunch of small retrieval tasks). We would need to provide a wrapper for backwards compatibility to make the previous tasks converted into this style, but I think it would be a lot more intuitive to use the future (as well as letting it do multi-level relevance labels, work well for other reranking/RAG style datasets, etc.).

I can probably take a look at this later next week as I've been wanting to add a reranking task and have been eyeing the format. Assuming others agree with this approach?

Answer selected by omihub777

KennethEnevoldsen Sep 29, 2024
Maintainer

Def. do. Would be a great addition

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for reranking tasks with multi-level relevance scores? #1240

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Support for reranking tasks with multi-level relevance scores? #1240

omihub777 Sep 26, 2024

Replies: 1 comment · 2 replies

KennethEnevoldsen Sep 28, 2024 Maintainer

orionw Sep 28, 2024 Maintainer

KennethEnevoldsen Sep 29, 2024 Maintainer

omihub777
Sep 26, 2024

Replies: 1 comment 2 replies

KennethEnevoldsen
Sep 28, 2024
Maintainer

orionw Sep 28, 2024
Maintainer

KennethEnevoldsen Sep 29, 2024
Maintainer