Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Long context benchmark enhancement #201

Open
joshuayao opened this issue Nov 12, 2024 · 2 comments
Open

[Feature] Long context benchmark enhancement #201

joshuayao opened this issue Nov 12, 2024 · 2 comments
Assignees
Labels
feature New feature or request
Milestone

Comments

@joshuayao
Copy link
Collaborator

joshuayao commented Nov 12, 2024

HELMET supports OPEA LLM endpoint.

@joshuayao joshuayao added the feature New feature or request label Nov 12, 2024
@joshuayao joshuayao added this to the v1.2 milestone Nov 12, 2024
@joshuayao joshuayao added this to OPEA Nov 12, 2024
@minmin-intel
Copy link
Collaborator

I was trying to reproduce HELMET results. I implemented a model class to interact with vllm-gaudi: https://github.com/minmin-intel/GenAIEval/blob/test-helmet/evals/evaluation/HELMET/model_utils.py#L204

So far, I have tested kilt_nq dataset in RAG category. I saw accuracy numbers are close to the results published in paper when input length = 8k, but 64k results have big differences when using vllm endpoint. However, using transformer pipeline gets similar results to the paper at both 8k and 64k length. Debugging WIP.

@joshuayao
Copy link
Collaborator Author

I was trying to reproduce HELMET results. I implemented a model class to interact with vllm-gaudi: https://github.com/minmin-intel/GenAIEval/blob/test-helmet/evals/evaluation/HELMET/model_utils.py#L204

So far, I have tested kilt_nq dataset in RAG category. I saw accuracy numbers are close to the results published in paper when input length = 8k, but 64k results have big differences when using vllm endpoint. However, using transformer pipeline gets similar results to the paper at both 8k and 64k length. Debugging WIP.

@minmin-intel , have you made progress on that?

@joshuayao joshuayao moved this to In progress in OPEA Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
Status: In progress
Development

No branches or pull requests

4 participants