Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Metadata for tasks in MTEB(eng, classic) #1886

Closed
x-tabdeveloping opened this issue Jan 28, 2025 · 3 comments
Closed

Missing Metadata for tasks in MTEB(eng, classic) #1886

x-tabdeveloping opened this issue Jan 28, 2025 · 3 comments
Assignees
Labels
leaderboard issues related to the leaderboard

Comments

@x-tabdeveloping
Copy link
Collaborator

x-tabdeveloping commented Jan 28, 2025

Many tasks in MTEB(eng, classic) are missing metadata, which is messing with the filtering on the leaderboard.
Here's a list:

import mteb

for task in mteb.get_benchmark("MTEB(eng, classic)").tasks:
    if not task.metadata.domains:
        print(f"{task.metadata.name}.domains = {task.metadata.domains}")
    if not task.metadata.languages:
        print(f"{task.metadata.name}.languages = {task.metadata.languages}")
    if not task.metadata.type:
        print(f"{task.metadata.name}.type = {task.metadata.type}")
ArxivClusteringS2S.domains = None
AskUbuntuDupQuestions.domains = None
BIOSSES.domains = None
CQADupstackAndroidRetrieval.domains = None
CQADupstackEnglishRetrieval.domains = None
CQADupstackGamingRetrieval.domains = None
CQADupstackGisRetrieval.domains = None
CQADupstackMathematicaRetrieval.domains = None
CQADupstackPhysicsRetrieval.domains = None
CQADupstackStatsRetrieval.domains = None
CQADupstackTexRetrieval.domains = None
CQADupstackUnixRetrieval.domains = None
CQADupstackWebmastersRetrieval.domains = None
CQADupstackWordpressRetrieval.domains = None
ClimateFEVER.domains = None
FEVER.domains = None
FiQA2018.domains = None
NQ.domains = None
QuoraRetrieval.domains = None
RedditClustering.domains = None
RedditClusteringP2P.domains = None
STSBenchmark.domains = None
StackExchangeClustering.domains = None
StackExchangeClusteringP2P.domains = None
StackOverflowDupQuestions.domains = None
TwitterSemEval2015.domains = None
TwitterURLCorpus.domains = None
MSMARCO.domains = None
@x-tabdeveloping x-tabdeveloping changed the title MSMARCO does not have domains in task metadata Missing Metadata for tasks in MTEB(eng, classic) Jan 28, 2025
@x-tabdeveloping x-tabdeveloping added the leaderboard issues related to the leaderboard label Jan 28, 2025
@x-tabdeveloping
Copy link
Collaborator Author

I can do a hotfix on the leaderboard to not filter out models that have None as domain, but the good solution here would be to annotate these.

@imenelydiaker
Copy link
Contributor

@KennethEnevoldsen @x-tabdeveloping Can I take this issue or have you already started working on it? #1867 (comment)

@isaac-chung
Copy link
Collaborator

isaac-chung commented Jan 31, 2025

This snippet now prints nothing on main. Closing this now.

import mteb

for task in mteb.get_benchmark("MTEB(eng, classic)").tasks:
    if not task.metadata.domains:
        print(f"{task.metadata.name}.domains = {task.metadata.domains}")
    if not task.metadata.languages:
        print(f"{task.metadata.name}.languages = {task.metadata.languages}")
    if not task.metadata.type:
        print(f"{task.metadata.name}.type = {task.metadata.type}")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
leaderboard issues related to the leaderboard
Projects
None yet
Development

No branches or pull requests

4 participants