Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTelemetry integration #699

Merged
merged 38 commits into from
Dec 11, 2024
Merged

Conversation

psardana
Copy link
Contributor

@psardana psardana commented Nov 11, 2024

Fixes Issue

closes #701

Changes proposed

Check List (Check all the applicable boxes)

  • I sign off on contributing this submission to open-source
  • My code follows the code style of this project.
  • My change requires changes to the documentation.
  • I have updated the documentation accordingly.
  • All new and existing tests passed.
  • This PR does not contain plagiarized content.
  • The title of my pull request is a short description of the requested changes.

Screenshots

Note to reviewers

Copy link

netlify bot commented Nov 11, 2024

Deploy Preview for opal-docs ready!

Name Link
🔨 Latest commit 4b24dc0
🔍 Latest deploy log https://app.netlify.com/sites/opal-docs/deploys/67598d96cb796400087c0a65
😎 Deploy Preview https://deploy-preview-699--opal-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

psardana and others added 3 commits November 11, 2024 16:11
…ck latency of data update events

feat(prometheus_metrics.py): create data_update_latency histogram to monitor latency of data update events
@psardana psardana marked this pull request as draft November 11, 2024 15:30
…etrics to use opal_server.metrics.prometheus_metrics for better organization

chore(requirements.txt): add prometheus_client to dependencies for metrics tracking functionality
…c to track updates per topic

feat(prometheus_metrics.py): introduce data_update_count_per_topic counter for monitoring data updates by topic
… to enhance observability

fix(api.py): increment policy bundle request count and measure latency for bundle generation
fix(callbacks.py): observe size of changed directories in policy update notifications
fix(task.py): track policy update count and latency when triggering policy watcher
@psardana psardana changed the title added initial config for prometheus integration in opal server prometheus integration Nov 15, 2024
@psardana psardana marked this pull request as ready for review November 15, 2024 07:29
@danyi1212 danyi1212 self-requested a review November 17, 2024 12:27
@danyi1212
Copy link
Collaborator

danyi1212 commented Nov 18, 2024

Hey @psardana, thank you for this contribution! 💎

Can you please add documentation about the metrics and explain how to set it up?
Additionally, please add a docker-compose example with a Prometheus container collecting the metrics from OPAL, which will be good for showcasing it in docs 🙂

Notice that there are conflicts against the main branch, please make sure to rebase from master.
Also note that the tests are failing, I think due to invalid label names in the metrics. Please take a look at it 😁

Looking forward for this! 🙏

@psardana
Copy link
Contributor Author

Hey @psardana, thank you for this contribution! 💎

Can you please add documentation about the metrics and explain how to set it up? Additionally, please add a docker-compose example with a Prometheus container collecting the metrics from OPAL, which will be good for showcasing it in docs 🙂

Notice that there are conflicts against the main branch, please make sure to rebase from master. Also note that the tests are failing, I think due to invalid label names in the metrics. Please take a look at it 😁

Looking forward for this! 🙏

Thank you for the review! I have added commits for documentation, docker compose and fixed the label names.

Copy link
Collaborator

@danyi1212 danyi1212 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good! 🌟
I've left some comments about specific areas and improvements.

Upon review, some instrumented parts appear to align more with tracing rather than pure metrics. This led to some unnatural metrics and duplications, having separate latency - count and error metrics.

To address this, we suggest exploring OpenTelemetry, which offers native Prometheus integration alongside robust tracing capabilities.

Here's a proposed mapping of the current metrics to OpenTelemetry:

  • opal_server_data_update -> Trace
  • opal_server_policy_update -> Trace
  • opal_server_policy_bundle_request -> Trace
  • opal_server_policy_bundle_size -> Metric
  • opal_server_active_clients -> Metric
  • opal_client_data_subscriptions -> Metric
  • opal_client_data_update_trigger -> Trace
  • opal_client_data_update_apply -> Trace (new)
  • opal_client_policy_update_apply -> Trace (new)
  • opal_client_policy_store_status -> Metric

We believe this approach will provide a more comprehensive observability solution.

Please let us know your thoughts, and let's work together to enhance OPAL's observability 💎

documentation/docs/tutorials/monitoring_opal.mdx Outdated Show resolved Hide resolved
documentation/docs/tutorials/monitoring_opal.mdx Outdated Show resolved Hide resolved
packages/opal-client/opal_client/client.py Outdated Show resolved Hide resolved
packages/opal-server/opal_server/server.py Outdated Show resolved Hide resolved
packages/opal-server/opal_server/policy/bundles/api.py Outdated Show resolved Hide resolved
packages/opal-server/opal_server/policy/bundles/api.py Outdated Show resolved Hide resolved
@psardana psardana requested a review from danyi1212 November 20, 2024 11:20
@psardana psardana changed the title prometheus integration OpenTelemetry integration Nov 25, 2024
Copy link
Collaborator

@danyi1212 danyi1212 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look very good! Kudus on the work. I've left a few small details to fix and improve, let me know if you have question and when they are ready

packages/opal-client/opal_client/client.py Outdated Show resolved Hide resolved
packages/opal-server/opal_server/policy/bundles/api.py Outdated Show resolved Hide resolved
packages/opal-server/opal_server/policy/bundles/api.py Outdated Show resolved Hide resolved
packages/opal-server/opal_server/policy/watcher/task.py Outdated Show resolved Hide resolved
packages/opal-server/opal_server/pubsub.py Outdated Show resolved Hide resolved
@psardana psardana requested a review from danyi1212 December 4, 2024 12:44
Copy link
Collaborator

@danyi1212 danyi1212 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! 👏💎🎉

@danyi1212 danyi1212 merged commit 141a4a6 into permitio:master Dec 11, 2024
9 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Prometheus Metrics for OPAL Server and Client
2 participants