Does it really take 2 days to evaluate llama-3.1-8b at 128k length? #79

eldarkurtic · 2025-01-16T15:25:21Z

Hi,

I am attempting to reproduce the results for the Llama-3.1-8B-Instruct model by following the steps provided in the README. Everything is set up within your Docker environment, and I am using vLLM for inference. My setup includes a single H100 GPU with a batch size of 8, as specified in the example scripts.

With this configuration, the runtime for processing a 128k context length (synthetic task) is approximately 2 days. Is this runtime expected? If not, could you please share the configuration or optimizations you used to efficiently handle this context length?

hsiehjackson · 2025-01-21T17:58:42Z

Hi @eldarkurtic, I don't apply any additional optimizations when running inference. I usually use 8 GPUs with TP=8 using vLLM. It takes around 2 hours to run 128K length with 500 samples for Llama-3.1-8B-Instruct.

eldarkurtic · 2025-01-29T13:39:55Z

Which batch size are you using?

hsiehjackson · 2025-01-29T19:12:49Z

I am using batch size = 1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does it really take 2 days to evaluate llama-3.1-8b at 128k length? #79

Does it really take 2 days to evaluate llama-3.1-8b at 128k length? #79

eldarkurtic commented Jan 16, 2025

hsiehjackson commented Jan 21, 2025

eldarkurtic commented Jan 29, 2025

hsiehjackson commented Jan 29, 2025

Does it really take 2 days to evaluate llama-3.1-8b at 128k length? #79

Does it really take 2 days to evaluate llama-3.1-8b at 128k length? #79

Comments

eldarkurtic commented Jan 16, 2025

hsiehjackson commented Jan 21, 2025

eldarkurtic commented Jan 29, 2025

hsiehjackson commented Jan 29, 2025