Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Filling missing metadata for leaderboard release #1895

Merged
merged 13 commits into from
Jan 30, 2025

Conversation

imenelydiaker
Copy link
Contributor

Related to issue #1886.

Copy link
Collaborator

@x-tabdeveloping x-tabdeveloping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a couple of comments as licence and date are missing some places.
I'm struggling to figure out why the tests are failing though.

mteb/tasks/Retrieval/eng/QuoraRetrieval.py Outdated Show resolved Hide resolved
mteb/tasks/STS/eng/BiossesSTS.py Outdated Show resolved Hide resolved
@@ -21,12 +21,12 @@ class BiossesSTS(AbsTaskSTS):
eval_langs=["eng-Latn"],
main_score="cosine_spearman",
date=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not have any information on the dates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, actually it's hard to infer the dates. I assumed you only needed domains so I filled them in pirority.

@x-tabdeveloping
Copy link
Collaborator

Also, remember to run linting :D

@isaac-chung
Copy link
Collaborator

Looks like tests are failing due to (old?) metadata - Pydantic validation:

sample_creation
  Input should be 'found', 'created', 'human-translated and localized', 'human-translated', 'machine-translated', 'machine-translated and verified', 'machine-translated and localized' or 'LM-generated and verified' [type=literal_error, input_value='derived', input_type=str]

@imenelydiaker
Copy link
Contributor Author

imenelydiaker commented Jan 30, 2025

@x-tabdeveloping I tried to fill a maximum number of missing metadata for the tasks you listed, I used mostly the data we put in the paper.

I don't have the date value for all of them as it's hard to find/infer. I thought it was not a critical metadata to release the LB? If it's not criticial, we can open a good first issue so that people can help filling what's missing.

@KennethEnevoldsen
Copy link
Contributor

Thanks for this fix @imenelydiaker - added annotations from #1910

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few additional annotations (financial where missing and filled out the ArguAna for its Polish translation as well). With this I believe it is good to merge

@KennethEnevoldsen KennethEnevoldsen enabled auto-merge (squash) January 30, 2025 20:58
@KennethEnevoldsen KennethEnevoldsen changed the title Filling missing metadata for leaderboard release fix: Filling missing metadata for leaderboard release Jan 30, 2025
@KennethEnevoldsen KennethEnevoldsen merged commit 938e90f into main Jan 30, 2025
11 checks passed
@KennethEnevoldsen KennethEnevoldsen deleted the missing-metadata-leaderboard branch January 30, 2025 21:05
@x-tabdeveloping
Copy link
Collaborator

Thanks for the work @imenelydiaker :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants