Handling Very Long LLM Responses #529
Replies: 2 comments 2 replies
-
Just to confirm the question - basically you are using an LLM to generate content, and the maximum generated tokens is hit. You want a guide showing the best strateg(ies) to continue generating without losing quality? In. your case, is the total context window (prompt inputs) exceeded as well? or just the output tokens? Best strategies somewhat depend on the model. Anthropic lets you end in an AI messag, and it directly continues that, so the "continue generating" option is fairly straightforward there |
Beta Was this translation helpful? Give feedback.
-
the bottleneck here might be the KV cache size, you could try different kv cache optimization/compression techniques to address this |
Beta Was this translation helpful? Give feedback.
-
While I completely understanding encoding, text splitting, etc. for handling input into an LLM, I have no idea how to move past the output limits.
In my case, I am passing in a very large amount of data and asking the LLM to generate Nodes and Edges for a Graph database. No matter what I attempt, I keep running into the response limit and not obtaining all of the needed statements.
Is there some way to handle very long LLM responses?
Beta Was this translation helpful? Give feedback.
All reactions