Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTelemetry integration #699

Merged
merged 38 commits into from
Dec 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
63b1fca
added initial config for prometheus integration in opal server
psardana Nov 11, 2024
cff9251
Merge branch 'master' into prometheus_integration
psardana Nov 11, 2024
d1ac117
feat(data_update_publisher.py): add data_update_latency metric to tra…
psardana Nov 11, 2024
dcdbbf7
Merge branch 'prometheus_integration' of github-personal:psardana/opa…
psardana Nov 11, 2024
faf0cd8
refactor(api.py, data_update_publisher.py): update import paths for m…
psardana Nov 12, 2024
bc06d6b
feat(data_update_publisher.py): add data_update_count_per_topic metri…
psardana Nov 12, 2024
6dfb25d
feat(metrics): add new metrics for policy updates and bundle requests…
psardana Nov 12, 2024
af89d39
moved prometheus metrics to opal common
psardana Nov 14, 2024
6e50055
scopes and security prometheus metrics added
psardana Nov 14, 2024
f44e2bb
added client metrics endpoint and total active clients metric
psardana Nov 14, 2024
00830eb
data topic subscribed by client
psardana Nov 14, 2024
192a255
added token type in prometheus metric
psardana Nov 14, 2024
bd32ddb
added labels to the metrics for data and policy updates
psardana Nov 15, 2024
a6f39b9
added labels in token requests generations and errors
psardana Nov 15, 2024
699e9da
added more labels for prometheus metrics for scope
psardana Nov 15, 2024
140bac3
added metrics for opal client
psardana Nov 15, 2024
fa0937e
Merge branch 'master' into prometheus_integration
psardana Nov 15, 2024
e6a1096
Merge branch 'master' into prometheus_integration
psardana Nov 18, 2024
1cb9ccf
added docker compose example with prometheus
psardana Nov 18, 2024
5a9bfbb
Merge branch 'master' into prometheus_integration
psardana Nov 18, 2024
e1fec7e
fixed metric labels
psardana Nov 18, 2024
035755d
Merge branch 'master' into prometheus_integration
psardana Nov 18, 2024
78ef777
added documentation
psardana Nov 18, 2024
8a57392
Merge branch 'master' into prometheus_integration
psardana Nov 18, 2024
5bde819
added open telemetry traces and metrics
psardana Nov 20, 2024
81e989a
added metrics and traces in documentation
psardana Nov 20, 2024
d197fe8
added scope id as an attribute
psardana Nov 22, 2024
f63beb6
renamed docker compose
psardana Nov 22, 2024
52e3114
fixed how span is being used
psardana Nov 22, 2024
2db0782
added documentation
psardana Nov 22, 2024
837780e
fixed descriptions
psardana Nov 22, 2024
61fd24c
removed top level code and protected metrics end point
psardana Dec 3, 2024
7f2fab0
fixes for tracing spans
psardana Dec 4, 2024
191cebf
fix metrics end point
psardana Dec 4, 2024
ea0c770
fixed docker compose and removed logging exporter from otel
psardana Dec 4, 2024
6f07718
Merge branch 'master' into prometheus_integration
psardana Dec 4, 2024
bf897dc
Merge branch 'master' into prometheus_integration
danyi1212 Dec 11, 2024
4b24dc0
Fixed pre-commit
danyi1212 Dec 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 96 additions & 0 deletions docker/docker-compose-with-prometheus-and-otel.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
services:
broadcast_channel:
image: postgres:alpine
environment:
- POSTGRES_DB=postgres
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
volumes:
- postgres_data:/var/lib/postgresql/data

otel-collector:
image: otel/opentelemetry-collector-contrib:0.114.0
volumes:
- ./docker_files/otel-collector-config.yaml:/etc/otelcol/config.yaml
command: ["--config", "/etc/otelcol/config.yaml"]
ports:
- "4317:4317"
- "8888:8888"
networks:
- opal-network

prometheus:
image: prom/prometheus:v2.45.0
volumes:
- ./docker_files/prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
ports:
- "9090:9090"
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--web.enable-lifecycle'
networks:
- opal-network
depends_on:
- otel-collector

grafana:
image: grafana/grafana:9.5.3
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_USERS_ALLOW_SIGN_UP=false
depends_on:
- prometheus
networks:
- opal-network

opal_server:
image: permitio/opal-server:latest
environment:
- OPAL_BROADCAST_URI=postgres://postgres:postgres@broadcast_channel:5432/postgres
- UVICORN_NUM_WORKERS=4
- OPAL_POLICY_REPO_URL=https://github.com/permitio/opal-example-policy-repo
- OPAL_POLICY_REPO_POLLING_INTERVAL=30
- OPAL_DATA_CONFIG_SOURCES={"config":{"entries":[{"url":"http://opal_server:7002/policy-data","topics":["policy_data"],"dst_path":"/static"}]}}
- OPAL_LOG_FORMAT_INCLUDE_PID=true
- OPAL_ENABLE_OPENTELEMETRY_TRACING=true
- OPAL_ENABLE_OPENTELEMETRY_METRICS=true
- OPAL_OPENTELEMETRY_OTLP_ENDPOINT="otel-collector:4317"
ports:
- "7002:7002"
depends_on:
- broadcast_channel
- otel-collector
networks:
- opal-network

opal_client:
image: permitio/opal-client:latest
environment:
- OPAL_SERVER_URL=http://opal_server:7002
- OPAL_LOG_FORMAT_INCLUDE_PID=true
- OPAL_INLINE_OPA_LOG_FORMAT=http
ports:
- "7766:7000"
- "8181:8181"
depends_on:
- opal_server
- otel-collector
command: sh -c "exec ./wait-for.sh opal_server:7002 --timeout=20 -- ./start.sh"
networks:
- opal-network

networks:
opal-network:
driver: bridge
volumes:
postgres_data:
prometheus_data:
grafana_data:
25 changes: 25 additions & 0 deletions docker/docker_files/otel-collector-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317

exporters:
prometheus:
endpoint: "0.0.0.0:8888"
debug:
verbosity: detailed

processors:
batch:

service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [debug]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [prometheus]
14 changes: 14 additions & 0 deletions docker/docker_files/prometheus.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
global:
scrape_interval: 15s
evaluation_interval: 15s

scrape_configs:
- job_name: 'opal_server'
static_configs:
- targets: ['opal_server:7002']
metrics_path: '/metrics'

- job_name: 'opal_client'
static_configs:
- targets: ['opal_client:7000']
metrics_path: '/metrics'
8 changes: 6 additions & 2 deletions documentation/docs/getting-started/configuration.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Please use this table as a reference.
| LOG_FILE_COMPRESSION | | |
| LOG_FILE_SERIALIZE | Serialize log messages in file into json format (useful for log aggregation platforms) | |
| LOG_FILE_LEVEL |
| LOG_DIAGNOSE | Include diagnosis in log messages | |
| LOG_DIAGNOSE | Include diagnosis in log messages | |
| STATISTICS_ENABLED | Collect statistics about OPAL clients. | |
| STATISTICS_ADD_CLIENT_CHANNEL | The topic to update about the new OPAL clients connection. | |
| STATISTICS_REMOVE_CLIENT_CHANNEL | The topic to update about the OPAL clients disconnection. | |
Expand All @@ -40,7 +40,11 @@ Please use this table as a reference.
| AUTH_PUBLIC_KEY | | |
| AUTH_JWT_ALGORITHM | JWT algorithm. See possible values [here](https://pyjwt.readthedocs.io/en/stable/algorithms.html). | |
| AUTH_JWT_AUDIENCE | | |
| AUTH_JWT_ISSUER | | |
| AUTH_JWT_ISSUER | | |
| ENABLE_OPENTELEMETRY_TRACING | Set if OPAL should enable tracing with OpenTelemetry | |
| ENABLE_OPENTELEMETRY_METRICS | Set if OPAL should enable metrics with OpenTelemetry | |
| ENABLE_OPENTELEMETRY_TRACING | The OpenTelemetry OTLP endpoint to send traces to, set only if ENABLE_OPENTELEMETRY_TRACING is enabled | |


## OPAL Server Configuration Variables

Expand Down
116 changes: 116 additions & 0 deletions documentation/docs/tutorials/monitoring_opal.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ There are multiple ways you can monitor your OPAL deployment:
- **Health-checks** - OPAL exposes HTTP health check endpoints ([See below](##health-checks))
- [**Callbacks**](/tutorials/healthcheck_policy_and_update_callbacks#-data-update-callbacks) - Using the callback webhooks feature - having OPAL-clients report their updates
- **Statistics** - Using the built-in statistics feature in OPAL ([See below](##opal-statistics))
- **OpenTelemetry Metrics and Tracing** - OPAL can expose metrics and tracing information using OpenTelemetry for monitoring ([See below](#opentelemetry-metrics-and-tracing)).

## Health checks

Expand Down Expand Up @@ -52,3 +53,118 @@ Available through `/pubsub_client_info` api route on the server.
### Caveats:
- When `UVICORN_NUM_WORKERS > 1`, retrieved information would only include clients connected to the replying server process.
- This is an early access feature and is likely to change. Backward compatibility is not garaunteed.

## OpenTelemetry Metrics and Tracing

OPAL supports exporting metrics and tracing information using OpenTelemetry, which can be integrated with various monitoring and observability tools.

### Enabling OpenTelemetry Metrics and Tracing

To enable OpenTelemetry metrics and tracing, you need to set the following environment variables in both OPAL server and OPAL client:

```
OPAL_ENABLE_OPENTELEMETRY_TRACING=true
OPAL_ENABLE_OPENTELEMETRY_METRICS=true
OPAL_OPENTELEMETRY_OTLP_ENDPOINT=<your-otel-collector-endpoint>
```

- OPAL_ENABLE_OPENTELEMETRY_TRACING: Set to `true` to enable tracing.
- OPAL_ENABLE_OPENTELEMETRY_METRICS: Set to `true` to enable metrics.
- OPAL_OPENTELEMETRY_OTLP_ENDPOINT: Set the endpoint for the OpenTelemetry Collector

### Exposing Metrics and Traces

- Both the server and client will expose a `/metrics` endpoint that returns metrics in Prometheus format.
- Traces are exported to the configured OpenTelemetry Collector endpoint using OTLP over gRPC.

### Available Metrics and Traces

Below is a list of the available metrics and traces in OPAL, along with their types, available tags (attributes), and explanations.

#### OPAL Server Metrics and Traces

##### 1) `opal_server_data_update`
- **Type**: Trace
- **Description**: Represents a data update operation in the OPAL server. This trace spans the process of publishing data updates to clients.
- **Attributes**:
- `topics_count`: Number of topics involved in the data update.
- `entries_count`: Number of data update entries.
- Additional attributes related to errors or execution time.

##### 2) `opal_server_policy_update`
- **Type**: Trace
- **Description**: Represents a policy update operation in the OPAL server. This trace spans the process of checking for policy changes and notifying clients.
- **Attributes**:
- Information about the policy repository, such as commit hashes.
- Errors encountered during the update process.

##### 3) `opal_server_policy_bundle_request`
- **Type**: Trace
- **Description**: Represents a request for a policy bundle from a client. This trace spans the process of generating and serving the policy bundle to the client.
- **Attributes**:
- `bundle.type`: The type of bundle (full or diff).
- `bundle.size`: The size of the bundle in number of files or bytes.
- `scope_id`: The scope identifier if scopes are used.

##### 4) `opal_server_policy_bundle_size`
- **Type**: Metric (Histogram)
- **Unit**: Files
- **Description**: Records the size of the policy bundles served by the OPAL server. The size is measured in the number of files included in the bundle.
- **Attributes**:
- `type`: The type of bundle (full or diff).

##### 5) `opal_server_active_clients`
- **Type**: Metric (UpDownCounter)
- **Description**: Tracks the number of active clients connected to the OPAL server.
- **Attributes**:
- `client_id`: The unique identifier of the client.
- `source`: The source host and port of the client (e.g., 192.168.1.10:34567).

#### OPAL Client Metrics and Traces

##### 1) `opal_client_data_subscriptions`
- **Type**: Metric (UpDownCounter)
- **Description**: Tracks the number of data subscriptions per client.
- **Attributes**:
- `client_id`: The unique identifier of the client.
- `topic`: The topic to which the client is subscribed.

##### 2) `opal_client_data_update_trigger`
- **Type**: Trace
- **Description**: Represents the operation of triggering a data update via the API in the OPAL client.
- **Attributes**:
- `source`: The source of the trigger (e.g., API).
- Errors encountered during the trigger.

##### 3) `opal_client_data_update_apply`
- **Type**: Trace
- **Description**: Represents the application of a data update within the OPAL client. This trace spans the process of fetching and applying data updates from the server.
- **Attributes**:
- Execution time.
- Errors encountered during the update.

##### 4) `opal_client_policy_update_apply`
- **Type**: Trace
- **Description**: Represents the application of a policy update within the OPAL client. This trace spans the process of fetching and applying policy updates from the server.
- **Attributes**:
- Execution time.
- Errors encountered during the update.

##### 5) `opal_client_policy_store_status`
- **Type**: Metric (Observable Gauge)
- **Description**: Indicates the current status of the policy store's authentication type used by the OPAL client.
- **Attributes**:
- `auth_type`: The authentication type configured for the policy store (e.g., TOKEN, OAUTH, NONE).
- **Value**: The metric has a value of 1 when the policy store is active with the specified authentication type.

### Example
psardana marked this conversation as resolved.
Show resolved Hide resolved

To monitor OPAL using Prometheus and Grafana, a ready-to-use Docker Compose configuration is provided in the root directory of the repository under docker. The file is named docker-compose-with-prometheus-and-otel.yml.

Run the following command to start Prometheus and Grafana:

```
docker compose -f docker/docker-compose-with-prometheus-and-otel.yml up
```

This setup will start Prometheus to scrape metrics from OPAL server and client, and Grafana to visualize the metrics.
Loading
Loading