Skip to content

Releases: cortexproject/cortex

Cortex 1.8.1

27 Apr 12:08
v1.8.1
4afaa35
Compare
Choose a tag to compare

1.8.1 / 2021-04-27

  • [CHANGE] Fix for CVE-2021-31232: Local file disclosure vulnerability when -experimental.alertmanager.enable-api is used. The HTTP basic auth password_file can be used as an attack vector to send any file content via a webhook. The alertmanager templates can be used as an attack vector to send any file content because the alertmanager can load any text file specified in the templates list.

Cortex 1.7.1

27 Apr 12:08
v1.7.1
06bbda1
Compare
Choose a tag to compare

1.7.1 / 2021-04-27

  • [CHANGE] Fix for CVE-2021-31232: Local file disclosure vulnerability when -experimental.alertmanager.enable-api is used. The HTTP basic auth password_file can be used as an attack vector to send any file content via a webhook. The alertmanager templates can be used as an attack vector to send any file content because the alertmanager can load any text file specified in the templates list.

Cortex 1.8.0

25 Mar 13:27
v1.8.0
51662ea
Compare
Choose a tag to compare

Cortex 1.8.0 features 122 contributions by 35 authors. Thank you!

Highlights

  • Automatic deletion of old blocks with configurable per-tenant retention
  • Introduction of new storage options in Ruler and Alertmanager, using bucket client from Thanos. Previous storage options will be deprecated in next release.
  • New thanosconvert tool to migrate Thanos or Prometheus block metadata to Cortex
  • Support for @ <timestamp> in PromQL (needs to be enabled by flag)
  • Configurable per-tenant server-side encryption for S3
  • Work on sharding Alertmanager continues (not finished yet)

Changelog

  • [CHANGE] Alertmanager: Don't expose cluster information to tenants via the /alertmanager/api/v1/status API endpoint when operating with clustering enabled. #3903
  • [CHANGE] Ingester: don't update internal "last updated" timestamp of TSDB if tenant only sends invalid samples. This affects how "idle" time is computed. #3727
  • [CHANGE] Require explicit flag -<prefix>.tls-enabled to enable TLS in GRPC clients. Previously it was enough to specify a TLS flag to enable TLS validation. #3156
  • [CHANGE] Query-frontend: removed -querier.split-queries-by-day (deprecated in Cortex 0.4.0). Please use -querier.split-queries-by-interval instead. #3813
  • [CHANGE] Store-gateway: the chunks pool controlled by -blocks-storage.bucket-store.max-chunk-pool-bytes is now shared across all tenants. #3830
  • [CHANGE] Ingester: return error code 400 instead of 429 when per-user/per-tenant series/metadata limits are reached. #3833
  • [CHANGE] Compactor: add reason label to cortex_compactor_blocks_marked_for_deletion_total metric. Source blocks marked for deletion by compactor are labelled as compaction, while blocks passing the retention period are labelled as retention. #3879
  • [CHANGE] Alertmanager: the DELETE /api/v1/alerts is now idempotent. No error is returned if the alertmanager config doesn't exist. #3888
  • [FEATURE] Experimental Ruler Storage: Add a separate set of configuration options to configure the ruler storage backend under the -ruler-storage. flag prefix. All blocks storage bucket clients and the config service are currently supported. Clients using this implementation will only be enabled if the existing -ruler.storage flags are left unset. #3805 #3864
  • [FEATURE] Experimental Alertmanager Storage: Add a separate set of configuration options to configure the alertmanager storage backend under the -alertmanager-storage. flag prefix. All blocks storage bucket clients and the config service are currently supported. Clients using this implementation will only be enabled if the existing -alertmanager.storage flags are left unset. #3888
  • [FEATURE] Adds support to S3 server-side encryption using KMS. The S3 server-side encryption config can be overridden on a per-tenant basis for the blocks storage, ruler and alertmanager. Deprecated -<prefix>.s3.sse-encryption, please use the following CLI flags that have been added. #3651 #3810 #3811 #3870 #3886 #3906
    • -<prefix>.s3.sse.type
    • -<prefix>.s3.sse.kms-key-id
    • -<prefix>.s3.sse.kms-encryption-context
  • [FEATURE] Querier: Enable @ <timestamp> modifier in PromQL using the new -querier.at-modifier-enabled flag. #3744
  • [FEATURE] Overrides Exporter: Add overrides-exporter module for exposing per-tenant resource limit overrides as metrics. It is not included in all target (single-binary mode), and must be explicitly enabled. #3785
  • [FEATURE] Experimental thanosconvert: introduce an experimental tool thanosconvert to migrate Thanos block metadata to Cortex metadata. #3770
  • [FEATURE] Alertmanager: It now shards the /api/v1/alerts API using the ring when sharding is enabled. #3671
    • Added -alertmanager.max-recv-msg-size (defaults to 16M) to limit the size of HTTP request body handled by the alertmanager.
    • New flags added for communication between alertmanagers:
      • -alertmanager.max-recv-msg-size
      • -alertmanager.alertmanager-client.remote-timeout
      • -alertmanager.alertmanager-client.tls-enabled
      • -alertmanager.alertmanager-client.tls-cert-path
      • -alertmanager.alertmanager-client.tls-key-path
      • -alertmanager.alertmanager-client.tls-ca-path
      • -alertmanager.alertmanager-client.tls-server-name
      • -alertmanager.alertmanager-client.tls-insecure-skip-verify
  • [FEATURE] Compactor: added blocks storage per-tenant retention support. This is configured via -compactor.retention-period, and can be overridden on a per-tenant basis. #3879
  • [ENHANCEMENT] Queries: Instrument queries that were discarded due to the configured max_outstanding_requests_per_tenant. #3894
    • cortex_query_frontend_discarded_requests_total
    • cortex_query_scheduler_discarded_requests_total
  • [ENHANCEMENT] Ruler: Add TLS and explicit basis authentication configuration options for the HTTP client the ruler uses to communicate with the alertmanager. #3752
    • -ruler.alertmanager-client.basic-auth-username: Configure the basic authentication username used by the client. Takes precedent over a URL configured username.
    • -ruler.alertmanager-client.basic-auth-password: Configure the basic authentication password used by the client. Takes precedent over a URL configured password.
    • -ruler.alertmanager-client.tls-ca-path: File path to the CA file.
    • -ruler.alertmanager-client.tls-cert-path: File path to the TLS certificate.
    • -ruler.alertmanager-client.tls-insecure-skip-verify: Boolean to disable verifying the certificate.
    • -ruler.alertmanager-client.tls-key-path: File path to the TLS key certificate.
    • -ruler.alertmanager-client.tls-server-name: Expected name on the TLS certificate.
  • [ENHANCEMENT] Ingester: exposed metric cortex_ingester_oldest_unshipped_block_timestamp_seconds, tracking the unix timestamp of the oldest TSDB block not shipped to the storage yet. #3705
  • [ENHANCEMENT] Prometheus upgraded. #3739 #3806
    • Avoid unnecessary runtime.GC() during compactions.
    • Prevent compaction loop in TSDB on data gap.
  • [ENHANCEMENT] Query-Frontend now returns server side performance metrics using Server-Timing header when query stats is enabled. #3685
  • [ENHANCEMENT] Runtime Config: Add a mode query parameter for the runtime config endpoint. /runtime_config?mode=diff now shows the YAML runtime configuration with all values that differ from the defaults. #3700
  • [ENHANCEMENT] Distributor: Enable downstream projects to wrap distributor push function and access the deserialized write requests berfore/after they are pushed. #3755
  • [ENHANCEMENT] Add flag -<prefix>.tls-server-name to require a specific server name instead of the hostname on the certificate. #3156
  • [ENHANCEMENT] Alertmanager: Remove a tenant's alertmanager instead of pausing it as we determine it is no longer needed. #3722
  • [ENHANCEMENT] Blocks storage: added more configuration options to S3 client. #3775
    • -blocks-storage.s3.tls-handshake-timeout: Maximum time to wait for a TLS handshake. 0 means no limit.
    • -blocks-storage.s3.expect-continue-timeout: The time to wait for a server's first response headers after fully writing the request headers if the request has an Expect header. 0 to send the request body immediately.
    • -blocks-storage.s3.max-idle-connections: Maximum number of idle (keep-alive) connections across all hosts. 0 means no limit.
    • -blocks-storage.s3.max-idle-connections-per-host: Maximum number of idle (keep-alive) connections to keep per-host. If 0, a built-in default value is used.
    • -blocks-storage.s3.max-connections-per-host: Maximum number of connections per host. 0 means no limit.
  • [ENHANCEMENT] Ingester: when tenant's TSDB is closed, Ingester now removes pushed metrics-metadata from memory, and removes metadata (cortex_ingester_memory_metadata, cortex_ingester_memory_metadata_created_total, cortex_ingester_memory_metadata_removed_total) and validation metrics (cortex_discarded_samples_total, cortex_discarded_metadata_total). #3782
  • [ENHANCEMENT] Distributor: cleanup metrics for inactive tenants. #3784
  • [ENHANCEMENT] Ingester: Have ingester to re-emit following TSDB metrics. #3800
    • cortex_ingester_tsdb_blocks_loaded
    • cortex_ingester_tsdb_reloads_total
    • cortex_ingester_tsdb_reloads_failures_total
    • cortex_ingester_tsdb_symbol_table_size_bytes
    • cortex_ingester_tsdb_storage_blocks_bytes
    • cortex_ingester_tsdb_time_retentions_total
  • [ENHANCEMENT] Querier: distribute workload across -store-gateway.sharding-ring.replication-factor store-gateway replicas when querying blocks and -store-gateway.sharding-enabled=true. #3824
  • [ENHANCEMENT] Distributor / HA Tracker: added cleanup of unused elected HA replicas from KV store. Added following metrics to monitor this process: #3809
    • cortex_ha_tracker_replicas_cleanup_started_total
    • cortex_ha_tracker_replicas_cleanup_marked_for_deletion_total
    • cortex_ha_tracker_replicas_cleanup_deleted_total
    • cortex_ha_tracker_replicas_cleanup_delete_failed_total
  • [ENHANCEMENT] Ruler now has new API endpoint /ruler/delete_tenant_config that can be used to delete all ruler groups for tenant. It is intended to be used by administrators who wish to clean up state after removed user. Note that this endpoint is enabled regardless of -experimental.ruler.enable-api. #3750 #3899
  • [ENHANCEMENT] Query-frontend, query-scheduler: cleanup metrics for inactive tenants. #3826
  • [ENHANCEMENT] Blocks storage: added -blocks-storage.s3.region support to S3 client configuration. #3811
  • [ENHANCEMENT] Distributor: Remove cached subrings for inactive users when using shuffle sharding. #3849
  • [ENHANCEMENT] Store-gateway: Reduced memory used to fetch chunks at query time. #3855
  • [ENHANCEMENT] Ingester: attempt to prevent idle compaction from happening in concurrent ingesters by introducing a 25% jitter to the configu...
Read more

Cortex 1.8.0-rc.1

15 Mar 16:49
4aa2783
Compare
Choose a tag to compare
Cortex 1.8.0-rc.1 Pre-release
Pre-release

Changes from 1.8.0-rc.0:

  • [BUGFIX] Distributor: reverted changes done to rate limiting in #3825. #3948

Cortex 1.8.0-rc.0

08 Mar 09:55
v1.8.0-rc.0
da31295
Compare
Choose a tag to compare
Cortex 1.8.0-rc.0 Pre-release
Pre-release

This was a release candidate for 1.8.0.

Cortex 1.7.0

23 Feb 18:26
v1.7.0
3a3015e
Compare
Choose a tag to compare

Changelog

Cortex

Note the blocks storage compactor runs a migration task at startup in this version, which can take many minutes and use a lot of RAM.
Turn this off after first run.

  • [CHANGE] FramedSnappy encoding support has been removed from Push and Remote Read APIs. This means Prometheus 1.6 support has been removed and the oldest Prometheus version supported in the remote write is 1.7. #3682
  • [CHANGE] Ruler: removed the flag -ruler.evaluation-delay-duration-deprecated which was deprecated in 1.4.0. Please use the ruler_evaluation_delay_duration per-tenant limit instead. #3694
  • [CHANGE] Removed the flags -<prefix>.grpc-use-gzip-compression which were deprecated in 1.3.0: #3694
    • -query-scheduler.grpc-client-config.grpc-use-gzip-compression: use -query-scheduler.grpc-client-config.grpc-compression instead
    • -frontend.grpc-client-config.grpc-use-gzip-compression: use -frontend.grpc-client-config.grpc-compression instead
    • -ruler.client.grpc-use-gzip-compression: use -ruler.client.grpc-compression instead
    • -bigtable.grpc-use-gzip-compression: use -bigtable.grpc-compression instead
    • -ingester.client.grpc-use-gzip-compression: use -ingester.client.grpc-compression instead
    • -querier.frontend-client.grpc-use-gzip-compression: use -querier.frontend-client.grpc-compression instead
  • [CHANGE] Querier: it's not required to set -frontend.query-stats-enabled=true in the querier anymore to enable query statistics logging in the query-frontend. The flag is now required to be configured only in the query-frontend and it will be propagated to the queriers. #3595 #3695
  • [CHANGE] Blocks storage: compactor is now required when running a Cortex cluster with the blocks storage, because it also keeps the bucket index updated. #3583
  • [CHANGE] Blocks storage: block deletion marks are now stored in a per-tenant global markers/ location too, other than within the block location. The compactor, at startup, will copy deletion marks from the block location to the global location. This migration is required only once, so you can safely disable it via -compactor.block-deletion-marks-migration-enabled=false once new compactor has successfully started once in your cluster. #3583
  • [CHANGE] OpenStack Swift: the default value for the -ruler.storage.swift.container-name and -swift.container-name config options has changed from cortex to empty string. If you were relying on the default value, you should set it back to cortex. #3660
  • [CHANGE] HA Tracker: configured replica label is now verified against label value length limit (-validation.max-length-label-value). #3668
  • [CHANGE] Distributor: extend_writes field in YAML configuration has moved from lifecycler (inside ingester_config) to distributor_config. This doesn't affect command line option -distributor.extend-writes, which stays the same. #3719
  • [CHANGE] Alertmanager: Deprecated -cluster. CLI flags in favor of their -alertmanager.cluster. equivalent. The deprecated flags (and their respective YAML config options) are: #3677
    • -cluster.listen-address in favor of -alertmanager.cluster.listen-address
    • -cluster.advertise-address in favor of -alertmanager.cluster.advertise-address
    • -cluster.peer in favor of -alertmanager.cluster.peers
    • -cluster.peer-timeout in favor of -alertmanager.cluster.peer-timeout
  • [CHANGE] Blocks storage: the default value of -blocks-storage.bucket-store.sync-interval has been changed from 5m to 15m. #3724
  • [FEATURE] Querier: Queries can be federated across multiple tenants. The tenants IDs involved need to be specified separated by a | character in the X-Scope-OrgID request header. This is an experimental feature, which can be enabled by setting -tenant-federation.enabled=true on all Cortex services. #3250
  • [FEATURE] Alertmanager: introduced the experimental option -alertmanager.sharding-enabled to shard tenants across multiple Alertmanager instances. This feature is still under heavy development and its usage is discouraged. The following new metrics are exported by the Alertmanager: #3664
    • cortex_alertmanager_ring_check_errors_total
    • cortex_alertmanager_sync_configs_total
    • cortex_alertmanager_sync_configs_failed_total
    • cortex_alertmanager_tenants_discovered
    • cortex_alertmanager_tenants_owned
  • [ENHANCEMENT] Allow specifying JAEGER_ENDPOINT instead of sampling server or local agent port. #3682
  • [ENHANCEMENT] Blocks storage: introduced a per-tenant bucket index, periodically updated by the compactor, used to avoid full bucket scanning done by queriers, store-gateways and rulers. The bucket index is updated by the compactor during blocks cleanup, on every -compactor.cleanup-interval. #3553 #3555 #3561 #3583 #3625 #3711 #3715
  • [ENHANCEMENT] Blocks storage: introduced an option -blocks-storage.bucket-store.bucket-index.enabled to enable the usage of the bucket index in the querier, store-gateway and ruler. When enabled, the querier, store-gateway and ruler will use the bucket index to find a tenant's blocks instead of running the periodic bucket scan. The following new metrics are exported by the querier and ruler: #3614 #3625
    • cortex_bucket_index_loads_total
    • cortex_bucket_index_load_failures_total
    • cortex_bucket_index_load_duration_seconds
    • cortex_bucket_index_loaded
  • [ENHANCEMENT] Compactor: exported the following metrics. #3583 #3625
    • cortex_bucket_blocks_count: Total number of blocks per tenant in the bucket. Includes blocks marked for deletion, but not partial blocks.
    • cortex_bucket_blocks_marked_for_deletion_count: Total number of blocks per tenant marked for deletion in the bucket.
    • cortex_bucket_blocks_partials_count: Total number of partial blocks.
    • cortex_bucket_index_last_successful_update_timestamp_seconds: Timestamp of the last successful update of a tenant's bucket index.
  • [ENHANCEMENT] Ruler: Add cortex_prometheus_last_evaluation_samples to expose the number of samples generated by a rule group per tenant. #3582
  • [ENHANCEMENT] Memberlist: add status page (/memberlist) with available details about memberlist-based KV store and memberlist cluster. It's also possible to view KV values in Go struct or JSON format, or download for inspection. #3575
  • [ENHANCEMENT] Memberlist: client can now keep a size-bounded buffer with sent and received messages and display them in the admin UI (/memberlist) for troubleshooting. #3581 #3602
  • [ENHANCEMENT] Blocks storage: added block index attributes caching support to metadata cache. The TTL can be configured via -blocks-storage.bucket-store.metadata-cache.block-index-attributes-ttl. #3629
  • [ENHANCEMENT] Alertmanager: Add support for Azure blob storage. #3634
  • [ENHANCEMENT] Compactor: tenants marked for deletion will now be fully cleaned up after some delay since deletion of last block. Cleanup includes removal of remaining marker files (including tenant deletion mark file) and files under debug/metas. #3613
  • [ENHANCEMENT] Compactor: retry compaction of a single tenant on failure instead of re-running compaction for all tenants. #3627
  • [ENHANCEMENT] Querier: Implement result caching for tenant query federation. #3640
  • [ENHANCEMENT] API: Add a mode query parameter for the config endpoint: #3645
    • /config?mode=diff: Shows the YAML configuration with all values that differ from the defaults.
    • /config?mode=defaults: Shows the YAML configuration with all the default values.
  • [ENHANCEMENT] OpenStack Swift: added the following config options to OpenStack Swift backend client: #3660
    • Chunks storage: -swift.auth-version, -swift.max-retries, -swift.connect-timeout, -swift.request-timeout.
    • Blocks storage: -blocks-storage.swift.auth-version, -blocks-storage.swift.max-retries, -blocks-storage.swift.connect-timeout, -blocks-storage.swift.request-timeout.
    • Ruler: -ruler.storage.swift.auth-version, -ruler.storage.swift.max-retries, -ruler.storage.swift.connect-timeout, -ruler.storage.swift.request-timeout.
  • [ENHANCEMENT] Disabled in-memory shuffle-sharding subring cache in the store-gateway, ruler and compactor. This should reduce the memory utilisation in these services when shuffle-sharding is enabled, without introducing a significantly increase CPU utilisation. #3601
  • [ENHANCEMENT] Shuffle sharding: optimised subring generation used by shuffle sharding. #3601
  • [ENHANCEMENT] New /runtime_config endpoint that returns the defined runtime configuration in YAML format. The returned configuration includes overrides. #3639
  • [ENHANCEMENT] Query-frontend: included the parameter name failed to validate in HTTP 400 message. #3703
  • [ENHANCEMENT] Fail to startup Cortex if provided runtime config is invalid. #3707
  • [ENHANCEMENT] Alertmanager: Add flags to customize the cluster configuration: #3667
    • -alertmanager.cluster.gossip-interval: The interval between sending gossip messages. By lowering this value (more frequent) gossip messages are propagated across cluster more quickly at the expense of increased bandwidth usage.
    • -alertmanager.cluster.push-pull-interval: The interval between gossip state syncs. Setting this interval lower (more frequent) will increase convergence speeds across larger clusters at the expense of increased bandwidth usage.
  • [ENHANCEMENT] Distributor: change the error message returned when a received series has too many label values. The new message format has the series at the end and this plays better with Prometheus logs truncation. #3718
    • From: sample for '<series>' has <value> label names; limit <value>
    • To: series has too many labels (actual: <value>, limit: <value>) series: '<series>'
  • [ENHANCEMENT] Improve bucket index loader to...
Read more

Cortex 1.7.0-rc.2

19 Feb 21:42
v1.7.0-rc.2
a87f171
Compare
Choose a tag to compare
Cortex 1.7.0-rc.2 Pre-release
Pre-release
  • [BUGFIX] Handle missing samples due to large steps and single point extents. #3818 #3835

Cortex 1.7.0-rc.1

14 Feb 20:18
v1.7.0-rc.1
9bfd413
Compare
Choose a tag to compare
Cortex 1.7.0-rc.1 Pre-release
Pre-release

Changelog

  • [BUGFIX] Fix ring tokens sorting regression introduced in #3601. #3815

Cortex 1.7.0-rc.0

21 Jan 19:22
v1.7.0-rc.0
af2c64e
Compare
Choose a tag to compare
Cortex 1.7.0-rc.0 Pre-release
Pre-release

Changelog

Cortex

  • [CHANGE] FramedSnappy encoding support has been removed from Push and Remote Read APIs. This means Prometheus 1.6 support has been removed and the oldest Prometheus version supported in the remote write is 1.7. #3682
  • [CHANGE] Ruler: removed the flag -ruler.evaluation-delay-duration-deprecated which was deprecated in 1.4.0. Please use the ruler_evaluation_delay_duration per-tenant limit instead. #3694
  • [CHANGE] Removed the flags -<prefix>.grpc-use-gzip-compression which were deprecated in 1.3.0: #3694
    • -query-scheduler.grpc-client-config.grpc-use-gzip-compression: use -query-scheduler.grpc-client-config.grpc-compression instead
    • -frontend.grpc-client-config.grpc-use-gzip-compression: use -frontend.grpc-client-config.grpc-compression instead
    • -ruler.client.grpc-use-gzip-compression: use -ruler.client.grpc-compression instead
    • -bigtable.grpc-use-gzip-compression: use -bigtable.grpc-compression instead
    • -ingester.client.grpc-use-gzip-compression: use -ingester.client.grpc-compression instead
    • -querier.frontend-client.grpc-use-gzip-compression: use -querier.frontend-client.grpc-compression instead
  • [CHANGE] Querier: it's not required to set -frontend.query-stats-enabled=true in the querier anymore to enable query statistics logging in the query-frontend. The flag is now required to be configured only in the query-frontend and it will be propagated to the queriers. #3595 #3695
  • [CHANGE] Blocks storage: compactor is now required when running a Cortex cluster with the blocks storage, because it also keeps the bucket index updated. #3583
  • [CHANGE] Blocks storage: block deletion marks are now stored in a per-tenant global markers/ location too, other than within the block location. The compactor, at startup, will copy deletion marks from the block location to the global location. This migration is required only once, so you can safely disable it via -compactor.block-deletion-marks-migration-enabled=false once new compactor has successfully started once in your cluster. #3583
  • [CHANGE] OpenStack Swift: the default value for the -ruler.storage.swift.container-name and -swift.container-name config options has changed from cortex to empty string. If you were relying on the default value, you should set it back to cortex. #3660
  • [CHANGE] HA Tracker: configured replica label is now verified against label value length limit (-validation.max-length-label-value). #3668
  • [CHANGE] Distributor: extend_writes field in YAML configuration has moved from lifecycler (inside ingester_config) to distributor_config. This doesn't affect command line option -distributor.extend-writes, which stays the same. #3719
  • [CHANGE] Alertmanager: Deprecated -cluster. CLI flags in favor of their -alertmanager.cluster. equivalent. The deprecated flags (and their respective YAML config options) are: #3677
    • -cluster.listen-address in favor of -alertmanager.cluster.listen-address
    • -cluster.advertise-address in favor of -alertmanager.cluster.advertise-address
    • -cluster.peer in favor of -alertmanager.cluster.peers
    • -cluster.peer-timeout in favor of -alertmanager.cluster.peer-timeout
  • [CHANGE] Blocks storage: the default value of -blocks-storage.bucket-store.sync-interval has been changed from 5m to 15m. #3724
  • [FEATURE] Querier: Queries can be federated across multiple tenants. The tenants IDs involved need to be specified separated by a | character in the X-Scope-OrgID request header. This is an experimental feature, which can be enabled by setting -tenant-federation.enabled=true on all Cortex services. #3250
  • [FEATURE] Alertmanager: introduced the experimental option -alertmanager.sharding-enabled to shard tenants across multiple Alertmanager instances. This feature is still under heavy development and its usage is discouraged. The following new metrics are exported by the Alertmanager: #3664
    • cortex_alertmanager_ring_check_errors_total
    • cortex_alertmanager_sync_configs_total
    • cortex_alertmanager_sync_configs_failed_total
    • cortex_alertmanager_tenants_discovered
    • cortex_alertmanager_tenants_owned
  • [ENHANCEMENT] Allow specifying JAEGER_ENDPOINT instead of sampling server or local agent port. #3682
  • [ENHANCEMENT] Blocks storage: introduced a per-tenant bucket index, periodically updated by the compactor, used to avoid full bucket scanning done by queriers, store-gateways and rulers. The bucket index is updated by the compactor during blocks cleanup, on every -compactor.cleanup-interval. #3553 #3555 #3561 #3583 #3625 #3711 #3715
  • [ENHANCEMENT] Blocks storage: introduced an option -blocks-storage.bucket-store.bucket-index.enabled to enable the usage of the bucket index in the querier, store-gateway and ruler. When enabled, the querier, store-gateway and ruler will use the bucket index to find a tenant's blocks instead of running the periodic bucket scan. The following new metrics are exported by the querier and ruler: #3614 #3625
    • cortex_bucket_index_loads_total
    • cortex_bucket_index_load_failures_total
    • cortex_bucket_index_load_duration_seconds
    • cortex_bucket_index_loaded
  • [ENHANCEMENT] Compactor: exported the following metrics. #3583 #3625
    • cortex_bucket_blocks_count: Total number of blocks per tenant in the bucket. Includes blocks marked for deletion, but not partial blocks.
    • cortex_bucket_blocks_marked_for_deletion_count: Total number of blocks per tenant marked for deletion in the bucket.
    • cortex_bucket_blocks_partials_count: Total number of partial blocks.
    • cortex_bucket_index_last_successful_update_timestamp_seconds: Timestamp of the last successful update of a tenant's bucket index.
  • [ENHANCEMENT] Ruler: Add cortex_prometheus_last_evaluation_samples to expose the number of samples generated by a rule group per tenant. #3582
  • [ENHANCEMENT] Memberlist: add status page (/memberlist) with available details about memberlist-based KV store and memberlist cluster. It's also possible to view KV values in Go struct or JSON format, or download for inspection. #3575
  • [ENHANCEMENT] Memberlist: client can now keep a size-bounded buffer with sent and received messages and display them in the admin UI (/memberlist) for troubleshooting. #3581 #3602
  • [ENHANCEMENT] Blocks storage: added block index attributes caching support to metadata cache. The TTL can be configured via -blocks-storage.bucket-store.metadata-cache.block-index-attributes-ttl. #3629
  • [ENHANCEMENT] Alertmanager: Add support for Azure blob storage. #3634
  • [ENHANCEMENT] Compactor: tenants marked for deletion will now be fully cleaned up after some delay since deletion of last block. Cleanup includes removal of remaining marker files (including tenant deletion mark file) and files under debug/metas. #3613
  • [ENHANCEMENT] Compactor: retry compaction of a single tenant on failure instead of re-running compaction for all tenants. #3627
  • [ENHANCEMENT] Querier: Implement result caching for tenant query federation. #3640
  • [ENHANCEMENT] API: Add a mode query parameter for the config endpoint: #3645
    • /config?mode=diff: Shows the YAML configuration with all values that differ from the defaults.
    • /config?mode=defaults: Shows the YAML configuration with all the default values.
  • [ENHANCEMENT] OpenStack Swift: added the following config options to OpenStack Swift backend client: #3660
    • Chunks storage: -swift.auth-version, -swift.max-retries, -swift.connect-timeout, -swift.request-timeout.
    • Blocks storage: -blocks-storage.swift.auth-version, -blocks-storage.swift.max-retries, -blocks-storage.swift.connect-timeout, -blocks-storage.swift.request-timeout.
    • Ruler: -ruler.storage.swift.auth-version, -ruler.storage.swift.max-retries, -ruler.storage.swift.connect-timeout, -ruler.storage.swift.request-timeout.
  • [ENHANCEMENT] Disabled in-memory shuffle-sharding subring cache in the store-gateway, ruler and compactor. This should reduce the memory utilisation in these services when shuffle-sharding is enabled, without introducing a significantly increase CPU utilisation. #3601
  • [ENHANCEMENT] Shuffle sharding: optimised subring generation used by shuffle sharding. #3601
  • [ENHANCEMENT] New /runtime_config endpoint that returns the defined runtime configuration in YAML format. The returned configuration includes overrides. #3639
  • [ENHANCEMENT] Query-frontend: included the parameter name failed to validate in HTTP 400 message. #3703
  • [ENHANCEMENT] Fail to startup Cortex if provided runtime config is invalid. #3707
  • [ENHANCEMENT] Alertmanager: Add flags to customize the cluster configuration: #3667
    • -alertmanager.cluster.gossip-interval: The interval between sending gossip messages. By lowering this value (more frequent) gossip messages are propagated across cluster more quickly at the expense of increased bandwidth usage.
    • -alertmanager.cluster.push-pull-interval: The interval between gossip state syncs. Setting this interval lower (more frequent) will increase convergence speeds across larger clusters at the expense of increased bandwidth usage.
  • [ENHANCEMENT] Distributor: change the error message returned when a received series has too many label values. The new message format has the series at the end and this plays better with Prometheus logs truncation. #3718
    • From: sample for '<series>' has <value> label names; limit <value>
    • To: series has too many labels (actual: <value>, limit: <value>) series: '<series>'
  • [ENHANCEMENT] Improve bucket index loader to handle edge case where new tenant has not had blocks uploaded to storage yet. #3717
  • [BUGFIX] Allow -querier.max-query-lookback use y|w|d suffix like deprecated -store.max-look-back-period. #3598
  • [BUGFIX] Memberlist: Entry in the ring should now not appear again after using "Forget" ...
Read more

Cortex 1.6.0

29 Dec 16:41
v1.6.0
2f7d37d
Compare
Choose a tag to compare

Changelog

Cortex

  • [CHANGE] Query Frontend: deprecate -querier.compress-http-responses in favour of -api.response-compression-enabled. #3544
  • [CHANGE] Querier: deprecated -store.max-look-back-period. You should use -querier.max-query-lookback instead. #3452
  • [CHANGE] Blocks storage: increased -blocks-storage.bucket-store.chunks-cache.attributes-ttl default from 24h to 168h (1 week). #3528
  • [CHANGE] Blocks storage: the config option -blocks-storage.bucket-store.index-cache.postings-compression-enabled has been deprecated and postings compression is always enabled. #3538
  • [CHANGE] Ruler: gRPC message size default limits on the Ruler-client side have changed: #3523
    • limit for outgoing gRPC messages has changed from 2147483647 to 16777216 bytes
    • limit for incoming gRPC messages has changed from 4194304 to 104857600 bytes
  • [FEATURE] Distributor/Ingester: Provide ability to not overflow writes in the presence of a leaving or unhealthy ingester. This allows for more efficient ingester rolling restarts. #3305
  • [FEATURE] Query-frontend: introduced query statistics logged in the query-frontend when enabled via -frontend.query-stats-enabled=true. When enabled, the metric cortex_query_seconds_total is tracked, counting the sum of the wall time spent across all queriers while running queries (on a per-tenant basis). The metrics cortex_request_duration_seconds and cortex_query_seconds_total are different: the first one tracks the request duration (eg. HTTP request from the client), while the latter tracks the sum of the wall time on all queriers involved executing the query. #3539
  • [ENHANCEMENT] API: Add GZIP HTTP compression to the API responses. Compression can be enabled via -api.response-compression-enabled. #3536
  • [ENHANCEMENT] Added zone-awareness support on queries. When zone-awareness is enabled, queries will still succeed if all ingesters in a single zone will fail. #3414
  • [ENHANCEMENT] Blocks storage ingester: exported more TSDB-related metrics. #3412
    • cortex_ingester_tsdb_wal_corruptions_total
    • cortex_ingester_tsdb_head_truncations_failed_total
    • cortex_ingester_tsdb_head_truncations_total
    • cortex_ingester_tsdb_head_gc_duration_seconds
  • [ENHANCEMENT] Enforced keepalive on all gRPC clients used for inter-service communication. #3431
  • [ENHANCEMENT] Added cortex_alertmanager_config_hash metric to expose hash of Alertmanager Config loaded per user. #3388
  • [ENHANCEMENT] Query-Frontend / Query-Scheduler: New component called "Query-Scheduler" has been introduced. Query-Scheduler is simply a queue of requests, moved outside of Query-Frontend. This allows Query-Frontend to be scaled separately from number of queues. To make Query-Frontend and Querier use Query-Scheduler, they need to be started with -frontend.scheduler-address and -querier.scheduler-address options respectively. #3374 #3471
  • [ENHANCEMENT] Query-frontend / Querier / Ruler: added -querier.max-query-lookback to limit how long back data (series and metadata) can be queried. This setting can be overridden on a per-tenant basis and is enforced in the query-frontend, querier and ruler. #3452 #3458
  • [ENHANCEMENT] Querier: added -querier.query-store-for-labels-enabled to query store for label names, label values and series APIs. Only works with blocks storage engine. #3461 #3520
  • [ENHANCEMENT] Ingester: exposed -blocks-storage.tsdb.wal-segment-size-bytes config option to customise the TSDB WAL segment max size. #3476
  • [ENHANCEMENT] Compactor: concurrently run blocks cleaner for multiple tenants. Concurrency can be configured via -compactor.cleanup-concurrency. #3483
  • [ENHANCEMENT] Compactor: shuffle tenants before running compaction. #3483
  • [ENHANCEMENT] Compactor: wait for a stable ring at startup, when sharding is enabled. #3484
  • [ENHANCEMENT] Store-gateway: added -blocks-storage.bucket-store.index-header-lazy-loading-enabled to enable index-header lazy loading (experimental). When enabled, index-headers will be mmap-ed only once required by a query and will be automatically released after -blocks-storage.bucket-store.index-header-lazy-loading-idle-timeout time of inactivity. #3498
  • [ENHANCEMENT] Alertmanager: added metrics cortex_alertmanager_notification_requests_total and cortex_alertmanager_notification_requests_failed_total. #3518
  • [ENHANCEMENT] Ingester: added -blocks-storage.tsdb.head-chunks-write-buffer-size-bytes to fine-tune the TSDB head chunks write buffer size when running Cortex blocks storage. #3518
  • [ENHANCEMENT] /metrics now supports OpenMetrics output. HTTP and gRPC servers metrics can now include exemplars. #3524
  • [ENHANCEMENT] Expose gRPC keepalive policy options by gRPC server. #3524
  • [ENHANCEMENT] Blocks storage: enabled caching of meta.json attributes, configurable via -blocks-storage.bucket-store.metadata-cache.metafile-attributes-ttl. #3528
  • [ENHANCEMENT] Compactor: added a config validation check to fail fast if the compactor has been configured invalid block range periods (each period is expected to be a multiple of the previous one). #3534
  • [ENHANCEMENT] Blocks storage: concurrently fetch deletion marks from object storage. #3538
  • [ENHANCEMENT] Blocks storage ingester: ingester can now close idle TSDB and delete local data. #3491 #3552
  • [ENHANCEMENT] Blocks storage: add option to use V2 signatures for S3 authentication. #3540
  • [ENHANCEMENT] Exported process metrics to monitor the number of memory map areas allocated. #3537
      • process_memory_map_areas
      • process_memory_map_areas_limit
  • [ENHANCEMENT] Ruler: Expose gRPC client options. #3523
  • [ENHANCEMENT] Compactor: added metrics to track on-going compaction. #3535
    • cortex_compactor_tenants_discovered
    • cortex_compactor_tenants_skipped
    • cortex_compactor_tenants_processing_succeeded
    • cortex_compactor_tenants_processing_failed
  • [ENHANCEMENT] Added new experimental API endpoints: POST /purger/delete_tenant and GET /purger/delete_tenant_status for deleting all tenant data. Only works with blocks storage. Compactor removes blocks that belong to user marked for deletion. #3549 #3558
  • [ENHANCEMENT] Chunks storage: add option to use V2 signatures for S3 authentication. #3560
  • [BUGFIX] Query-Frontend: cortex_query_seconds_total now return seconds not nanoseconds. #3589
  • [BUGFIX] Blocks storage ingester: fixed some cases leading to a TSDB WAL corruption after a partial write to disk. #3423
  • [BUGFIX] Blocks storage: Fix the race between ingestion and /flush call resulting in overlapping blocks. #3422
  • [BUGFIX] Querier: fixed -querier.max-query-into-future which wasn't correctly enforced on range queries. #3452
  • [BUGFIX] Fixed float64 precision stability when aggregating metrics before exposing them. This could have lead to false counters resets when querying some metrics exposed by Cortex. #3506
  • [BUGFIX] Querier: the meta.json sync concurrency done when running Cortex with the blocks storage is now controlled by -blocks-storage.bucket-store.meta-sync-concurrency instead of the incorrect -blocks-storage.bucket-store.block-sync-concurrency (default values are the same). #3531
  • [BUGFIX] Querier: fixed initialization order of querier module when using blocks storage. It now (again) waits until blocks have been synchronized. #3551

Blocksconvert

  • [ENHANCEMENT] Scheduler: ability to ignore users based on regexp, using -scheduler.ignore-users-regex flag. #3477
  • [ENHANCEMENT] Builder: Parallelize reading chunks in the final stage of building block. #3470
  • [ENHANCEMENT] Builder: remove duplicate label names from chunk. #3547