Mediapipe graph process time metric #2942

bstrzele · 2024-12-23T10:32:07Z

🛠 Summary

CVS-160188
A Prometheus metric ovms_graph_processing_time_us for tracking graph processing of histogram type.
Uses the same buckets like for single model execution metric.

The metric tracks time allocated by the graph to serve a single user request for unary calls or the session time for the stream requests.

🧪 Checklist

Unit tests added.
The documentation updated.
Change follows security best practices.
``

src/metric_config.cpp

dkalinowski · 2025-01-07T14:19:33Z

src/mediapipe_internal/mediapipegraphexecutor.hpp

@@ -196,6 +203,9 @@ class MediapipeGraphExecutor {
            INCREMENT_IF_ENABLED(this->mediapipeServableMetricReporter->getGraphErrorMetric(executionContext));
        }
        MP_RETURN_ON_FAIL(status, "graph wait until done", mediapipeAbslToOvmsStatus(status.code()));
+        timer.stop(PROCESS);
+        double processTime = timer.template elapsed<std::chrono::microseconds>(PROCESS);
+        OBSERVE_IF_ENABLED(this->mediapipeServableMetricReporter->getProcessingTimeMetric(executionContext), processTime);


It is not clear to me if we want or don't want to register processing times of failed execution @dtrawins @bstrzele
In either way, I don't think this is correct because:
a) we do not register time in all error return codepaths
b) we do register time in one specific error return codepath (when if (outputPollers.size() != outputPollersWithReceivedPacket.size()) {)

dkalinowski · 2025-01-07T14:20:08Z

docs/metrics.md

@@ -241,6 +241,7 @@ For [MediaPipe Graphs](./mediapipe.md) execution there are 4 generic metrics whi
 | counter      | ovms_responses | Useful to track number of packets generated by MediaPipe graph. Keep in mind that single request may trigger production of multiple (or zero) packets, therefore tracking number of responses is complementary to tracking accepted requests. For example tracking streaming partial responses of LLM text generation graphs. |
 | gauge      | ovms_current_graphs | Number of graphs currently in-process. For unary communication it is equal to number of currently processing requests (each request initializes separate MediaPipe graph). For streaming communication it is equal to number of active client connections. Each connection is able to reuse the graph and decide when to delete it when the connection is closed. |
 | counter      | ovms_graph_error | Counts errors in MediaPipe graph execution phase. For example V3 LLM text generation fails in LLMCalculator due to missing prompt - calculator returns an error and graph cancels. |
+| histogram      | ovms_graph_processing_time_us | Time for which mediapipe graph was opened. |


Please explain more - for successful only? for failed as well? which fails count?

docs/metrics.md

dtrawins · 2025-01-28T08:08:53Z

docs/metrics.md

@@ -241,6 +241,7 @@ For [MediaPipe Graphs](./mediapipe.md) execution there are 4 generic metrics whi
 | counter      | ovms_responses | Useful to track number of packets generated by MediaPipe graph. Keep in mind that single request may trigger production of multiple (or zero) packets, therefore tracking number of responses is complementary to tracking accepted requests. For example tracking streaming partial responses of LLM text generation graphs. |
 | gauge      | ovms_current_graphs | Number of graphs currently in-process. For unary communication it is equal to number of currently processing requests (each request initializes separate MediaPipe graph). For streaming communication it is equal to number of active client connections. Each connection is able to reuse the graph and decide when to delete it when the connection is closed. |
 | counter      | ovms_graph_error | Counts errors in MediaPipe graph execution phase. For example V3 LLM text generation fails in LLMCalculator due to missing prompt - calculator returns an error and graph cancels. |
+| histogram      | ovms_graph_processing_time_us | Time for which mediapipe graph was opened and has been successfully closed. |


I think in this table we should have also a column indicating for which type of request is applicable each metric. Right know everything is mixed. Some are for MP graphs only and some for individual model only. It could be adjusted in a separate PR.

Every metric in this table should only be relevant for mediapipe

But this one probably could use come kind of indicator

Co-authored-by: Trawinski, Dariusz <[email protected]>

bstrzele requested review from dtrawins and dkalinowski and removed request for dtrawins December 23, 2024 10:33

bstrzele force-pushed the mp_process_time_metric branch from 08f22b8 to 8493488 Compare December 23, 2024 10:36

bstrzele marked this pull request as ready for review December 23, 2024 10:41

dtrawins reviewed Dec 23, 2024

View reviewed changes

src/metric_config.cpp Outdated Show resolved Hide resolved

bstrzele force-pushed the mp_process_time_metric branch from 9fd076f to 8174bf7 Compare December 23, 2024 14:30

bstrzele force-pushed the mp_process_time_metric branch 2 times, most recently from 2df7c56 to ff2fa6b Compare January 7, 2025 13:02

dkalinowski reviewed Jan 7, 2025

View reviewed changes

bstrzele force-pushed the mp_process_time_metric branch 2 times, most recently from ae8dd5a to 2455f22 Compare January 10, 2025 10:31

dkalinowski approved these changes Jan 10, 2025

View reviewed changes

bstrzele force-pushed the mp_process_time_metric branch from f6c43b3 to a51b6e7 Compare January 27, 2025 14:28

dtrawins reviewed Jan 28, 2025

View reviewed changes

bstrzele force-pushed the mp_process_time_metric branch 2 times, most recently from b38186a to 02d5b8e Compare January 28, 2025 14:08

dtrawins added this to the 2025.0_RC milestone Jan 29, 2025

bstrzele force-pushed the mp_process_time_metric branch from 02d5b8e to 71d8530 Compare January 29, 2025 13:20

dtrawins approved these changes Jan 29, 2025

View reviewed changes

bstrzele and others added 7 commits January 29, 2025 15:04

process time

e039e33

review fix

1672215

documentation

fecc7d2

fix

d95e311

fix wording

8db5784

Update docs/metrics.md

1760282

Co-authored-by: Trawinski, Dariusz <[email protected]>

adjust count

f838549

bstrzele force-pushed the mp_process_time_metric branch from 71d8530 to f838549 Compare January 29, 2025 14:04

bstrzele merged commit a8fc284 into main Jan 29, 2025
14 checks passed

bstrzele deleted the mp_process_time_metric branch January 29, 2025 14:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mediapipe graph process time metric #2942

Mediapipe graph process time metric #2942

bstrzele commented Dec 23, 2024 •

edited by dtrawins

Loading

dkalinowski Jan 7, 2025

dkalinowski Jan 7, 2025

dtrawins Jan 28, 2025

bstrzele Jan 28, 2025

bstrzele Jan 28, 2025

Mediapipe graph process time metric #2942

Mediapipe graph process time metric #2942

Conversation

bstrzele commented Dec 23, 2024 • edited by dtrawins Loading

🛠 Summary

🧪 Checklist

dkalinowski Jan 7, 2025

Choose a reason for hiding this comment

dkalinowski Jan 7, 2025

Choose a reason for hiding this comment

dtrawins Jan 28, 2025

Choose a reason for hiding this comment

bstrzele Jan 28, 2025

Choose a reason for hiding this comment

bstrzele Jan 28, 2025

Choose a reason for hiding this comment

bstrzele commented Dec 23, 2024 •

edited by dtrawins

Loading