Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract Input/Output token usage from request. #111

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

sukumargaonkar
Copy link

Currently only total_tokens usage from response body are pushed to dynamicMetadata. This PR updates that logic to include input and output token usage as well.

This PR also introduces monitorContinuousUsageStats flag in config for external process
The flag controls if external process monitors every response-body chunk for usage stats
when true, it will monitor for token metadata usage in every response-body chunk received during request in streaming mode (compatible with vllm's 'continuous_usage_stats' flag)
when false, it will stop monitoring after detecting token metadata usage after finding it for the first time.
(compatible with OpenAI's streaming response (https://platform.openai.com/docs/api-reference/chat/streaming#chat/streaming-usage))
Only affects request in streaming mode

Currently only total_tokens usage from response body are pushed to dynamicMetadata. This PR updates that logic to include input and output token usage as well.

Signed-off-by: Sukumar Gaonkar <[email protected]>
@sukumargaonkar sukumargaonkar requested a review from a team as a code owner January 16, 2025 22:37
…useage

# Conflicts:
#	internal/extproc/translator/openai_awsbedrock.go
#	internal/extproc/translator/openai_openai.go
Signed-off-by: Sukumar Gaonkar <[email protected]>
Comment on lines +71 to +77
// MonitorContinuousUsageStats flag controls if external process monitors every response-body chunk for usage stats
// when true, it will monitor for token metadata usage in every response-body chunk received during request in streaming mode
// compatible with vllm's 'continuous_usage_stats' flag
// when false, it will stop monitoring after detecting token metadata usage after finding it for the first time.
// compatible with OpenAI's streaming response (https://platform.openai.com/docs/api-reference/chat/streaming#chat/streaming-usage)
// Only affects request in streaming mode
MonitorContinuousUsageStats bool `yaml:"monitorContinuousUsageStats,omitempty"`
Copy link
Member

@mathetake mathetake Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you remove the change related to this? I think this is another issue and metadata is not cumulative so basically it's overriding previous ones if it's emitted in the middle.

@@ -65,7 +71,7 @@ type Translator interface {
ResponseBody(body io.Reader, endOfStream bool) (
headerMutation *extprocv3.HeaderMutation,
bodyMutation *extprocv3.BodyMutation,
usedToken uint32,
tokenUsage *TokenUsage,
Copy link
Member

@mathetake mathetake Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think this three uint64 would be worth the allocations, so could you return the value

Suggested change
tokenUsage *TokenUsage,
tokenUsage TokenUsage,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kept it as a pointer so that nil value would indicate absence of tokenUsage data in the responseBody
if tokenUsage == nil {...}
with struct will have to check zero value for one of the member
if tokenUsage.TotalTokens == 0 {...}

thoughts?

Comment on lines +15 to +20
type TokenUsage struct {
InputTokens uint32
OutputTokens uint32
TotalTokens uint32
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any public structs/fields/methods would need comments.

@@ -29,3 +29,4 @@ site/yarn-debug.log*
site/yarn-error.log*
site/static/.DS_Store
site/temp
/.idea/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary change - already in here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also you can have a global gitignore in your os... that's where people usually place the gitignore for something that's not generated by the project

@mathetake
Copy link
Member

mathetake commented Jan 16, 2025

sorry this feels a conflict with #103 i would appreciate it if you could stop it until it lands - I think the PR will supersede this PR besides vllm stuff part

@mathetake
Copy link
Member

@sukumargaonkar thank you for waiting - #103 has landed so could you rework the PR and focus on the vllm stuff?

@mathetake mathetake self-assigned this Jan 18, 2025
@mathetake
Copy link
Member

ping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants