-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark ClickHouse queries #700
Comments
I think there's an issue with writing a full system performance test for this, I don't think it can be done without an API change. TLDR: Explanation: # Sample meter to count LLM Token Usage
meters:
- slug: tokens_total
description: AI Token Usage
eventType: prompt
aggregation: SUM
valueProperty: $.tokens
groupBy:
model: $.model # Group1
type: $.type # Group2 turns to something like CREATE MATERIALIZED VIEW openmeter.om_default_tokens_total
(
...
`model` String, -- GroupBy1
`type` String -- GroupBy2
)
...
AS SELECT
...
JSON_VALUE(data,
'$.model') AS model,
JSON_VALUE(data,
'$.type') AS type
... The JSONPath descriptions ( ALTER TABLE
openmeter.om_events
ADD
PROJECTION om_default_api_requests_total (
SELECT
-- Further pre-calculted fields can be added here to speed up execution, e.g. (empty(openmeter.om_events.validation_error) = 1)...
subject,
tumbleStart(
time,
toIntervalMinute(1)
) AS windowstart,
tumbleEnd(
time,
toIntervalMinute(1)
) AS windowend,
sumState(
cast(
JSON_VALUE(data, '$.tokens'),
'Float64'
)
) AS value,
JSON_VALUE(data, '$.model') as model,
JSON_VALUE(data, '$.type') as type
/**
* Note the missing WHERE block compared to the MATERIALIZED VIEW.
* Where blocks are not currently supported in PROJECTIONs, with their priority on the roadmap being questionable:
* https://github.com/ClickHouse/ClickHouse/issues/33678
*
* This is not necessaraly an issue as the filtering can take place in the actual query (planner still using the PROJECTION for a limited set of results), but it in itself would warrant some modifications:
* - Meters wouldn't phisically be separate on the DB level. Meters of different namespaces would use the same underlying projections so additonal caution would be needed.
* - The projections would be populated with a lot of nonsensical data. e.g: in the example above most of the events wouldn't have a `$.model` field. Clickhouse doesn't complain about per say right now but if the group by functionality would be extended in the future it could cause issues.
*/
GROUP BY
windowstart,
windowend,
subject,
model,
type
); A query utilising the above projection would look something like this: -- Simple way of forcing projeciton to be used by mirroring it's SELECT block
WITH proj AS (
subject,
tumbleStart(
time,
toIntervalMinute(1)
) AS windowstart,
tumbleEnd(
time,
toIntervalMinute(1)
) AS windowend,
sumState(
cast(
JSON_VALUE(data, '$.tokens'),
'Float64'
)
) AS value,
JSON_VALUE(data, '$.model') as model, -- Note how we need to reference the JSONPath here
JSON_VALUE(data, '$.type') as type -- And here
FROM
openmeter.om_events
-- Note re-introducing the WHERE block. This does not affect the query planning, it still uses the projection.
WHERE
(openmeter.om_events.namespace = 'default')
AND (empty(openmeter.om_events.validation_error) = 1)
AND (openmeter.om_events.type = 'prompt')
GROUP BY
windowstart,
windowend,
subject,
model,
type
) SELECT windowstart, windowend, sumMerge(value) AS value, subject, model, type
FROM proj
WHERE
(subject = ?)
AND windowstart >= ?
AND windowend <= ?
GROUP BY
windowstart,
windowend,
subject,
group1,
group2
ORDER BY
windowstart Where the JSONPath descriptors are required once again, but they are not part of the current API, and there's no simple way to contextually access them as they would not be stored anywhere: /api/v1/meters/{meterIdOrSlug}/query:
get:
description: Query meter
operationId: queryMeter
tags:
- Meters
parameters:
- $ref: "#/components/parameters/meterIdOrSlug"
- $ref: "#/components/parameters/queryFrom"
- $ref: "#/components/parameters/queryTo"
- $ref: "#/components/parameters/queryWindowSize"
- $ref: "#/components/parameters/queryWindowTimeZone"
- $ref: "#/components/parameters/querySubject"
- $ref: "#/components/parameters/queryGroupBy"
# ...
queryGroupBy:
name: groupBy
in: query
required: false
description: |
If not specified a single aggregate will be returned for each subject and time window.
`subject` is a reserved group by value.
schema:
type: array
items:
type: string Based on this I have 2 suggestions in comparing performance:
Of course I might be misunderstanding something about how the system works, but either way I think further clarification is needed on this. |
Preflight Checklist
Problem Description
The meters are currently implemented as materialized views on the events table. This has a drawback that meters have to be created first for the events to be processed:
Proposed Solution
Benchmark:
End goal
Back with data that projections can be used or not as an alternative to materialized views.
Additional Information
Seeder: https://github.com/openmeterio/openmeter/blob/main/etc/seed/seed.yaml (
make seed
)Grafana K6: https://k6.io/ (load tester)
The text was updated successfully, but these errors were encountered: