github cortexproject/cortex v1.8.0
Cortex 1.8.0

latest releases: v1.18.1, v1.18.0, v1.18.0-rc.0...
3 years ago

Cortex 1.8.0 features 122 contributions by 35 authors. Thank you!

Highlights

  • Automatic deletion of old blocks with configurable per-tenant retention
  • Introduction of new storage options in Ruler and Alertmanager, using bucket client from Thanos. Previous storage options will be deprecated in next release.
  • New thanosconvert tool to migrate Thanos or Prometheus block metadata to Cortex
  • Support for @ <timestamp> in PromQL (needs to be enabled by flag)
  • Configurable per-tenant server-side encryption for S3
  • Work on sharding Alertmanager continues (not finished yet)

Changelog

  • [CHANGE] Alertmanager: Don't expose cluster information to tenants via the /alertmanager/api/v1/status API endpoint when operating with clustering enabled. #3903
  • [CHANGE] Ingester: don't update internal "last updated" timestamp of TSDB if tenant only sends invalid samples. This affects how "idle" time is computed. #3727
  • [CHANGE] Require explicit flag -<prefix>.tls-enabled to enable TLS in GRPC clients. Previously it was enough to specify a TLS flag to enable TLS validation. #3156
  • [CHANGE] Query-frontend: removed -querier.split-queries-by-day (deprecated in Cortex 0.4.0). Please use -querier.split-queries-by-interval instead. #3813
  • [CHANGE] Store-gateway: the chunks pool controlled by -blocks-storage.bucket-store.max-chunk-pool-bytes is now shared across all tenants. #3830
  • [CHANGE] Ingester: return error code 400 instead of 429 when per-user/per-tenant series/metadata limits are reached. #3833
  • [CHANGE] Compactor: add reason label to cortex_compactor_blocks_marked_for_deletion_total metric. Source blocks marked for deletion by compactor are labelled as compaction, while blocks passing the retention period are labelled as retention. #3879
  • [CHANGE] Alertmanager: the DELETE /api/v1/alerts is now idempotent. No error is returned if the alertmanager config doesn't exist. #3888
  • [FEATURE] Experimental Ruler Storage: Add a separate set of configuration options to configure the ruler storage backend under the -ruler-storage. flag prefix. All blocks storage bucket clients and the config service are currently supported. Clients using this implementation will only be enabled if the existing -ruler.storage flags are left unset. #3805 #3864
  • [FEATURE] Experimental Alertmanager Storage: Add a separate set of configuration options to configure the alertmanager storage backend under the -alertmanager-storage. flag prefix. All blocks storage bucket clients and the config service are currently supported. Clients using this implementation will only be enabled if the existing -alertmanager.storage flags are left unset. #3888
  • [FEATURE] Adds support to S3 server-side encryption using KMS. The S3 server-side encryption config can be overridden on a per-tenant basis for the blocks storage, ruler and alertmanager. Deprecated -<prefix>.s3.sse-encryption, please use the following CLI flags that have been added. #3651 #3810 #3811 #3870 #3886 #3906
    • -<prefix>.s3.sse.type
    • -<prefix>.s3.sse.kms-key-id
    • -<prefix>.s3.sse.kms-encryption-context
  • [FEATURE] Querier: Enable @ <timestamp> modifier in PromQL using the new -querier.at-modifier-enabled flag. #3744
  • [FEATURE] Overrides Exporter: Add overrides-exporter module for exposing per-tenant resource limit overrides as metrics. It is not included in all target (single-binary mode), and must be explicitly enabled. #3785
  • [FEATURE] Experimental thanosconvert: introduce an experimental tool thanosconvert to migrate Thanos block metadata to Cortex metadata. #3770
  • [FEATURE] Alertmanager: It now shards the /api/v1/alerts API using the ring when sharding is enabled. #3671
    • Added -alertmanager.max-recv-msg-size (defaults to 16M) to limit the size of HTTP request body handled by the alertmanager.
    • New flags added for communication between alertmanagers:
      • -alertmanager.max-recv-msg-size
      • -alertmanager.alertmanager-client.remote-timeout
      • -alertmanager.alertmanager-client.tls-enabled
      • -alertmanager.alertmanager-client.tls-cert-path
      • -alertmanager.alertmanager-client.tls-key-path
      • -alertmanager.alertmanager-client.tls-ca-path
      • -alertmanager.alertmanager-client.tls-server-name
      • -alertmanager.alertmanager-client.tls-insecure-skip-verify
  • [FEATURE] Compactor: added blocks storage per-tenant retention support. This is configured via -compactor.retention-period, and can be overridden on a per-tenant basis. #3879
  • [ENHANCEMENT] Queries: Instrument queries that were discarded due to the configured max_outstanding_requests_per_tenant. #3894
    • cortex_query_frontend_discarded_requests_total
    • cortex_query_scheduler_discarded_requests_total
  • [ENHANCEMENT] Ruler: Add TLS and explicit basis authentication configuration options for the HTTP client the ruler uses to communicate with the alertmanager. #3752
    • -ruler.alertmanager-client.basic-auth-username: Configure the basic authentication username used by the client. Takes precedent over a URL configured username.
    • -ruler.alertmanager-client.basic-auth-password: Configure the basic authentication password used by the client. Takes precedent over a URL configured password.
    • -ruler.alertmanager-client.tls-ca-path: File path to the CA file.
    • -ruler.alertmanager-client.tls-cert-path: File path to the TLS certificate.
    • -ruler.alertmanager-client.tls-insecure-skip-verify: Boolean to disable verifying the certificate.
    • -ruler.alertmanager-client.tls-key-path: File path to the TLS key certificate.
    • -ruler.alertmanager-client.tls-server-name: Expected name on the TLS certificate.
  • [ENHANCEMENT] Ingester: exposed metric cortex_ingester_oldest_unshipped_block_timestamp_seconds, tracking the unix timestamp of the oldest TSDB block not shipped to the storage yet. #3705
  • [ENHANCEMENT] Prometheus upgraded. #3739 #3806
    • Avoid unnecessary runtime.GC() during compactions.
    • Prevent compaction loop in TSDB on data gap.
  • [ENHANCEMENT] Query-Frontend now returns server side performance metrics using Server-Timing header when query stats is enabled. #3685
  • [ENHANCEMENT] Runtime Config: Add a mode query parameter for the runtime config endpoint. /runtime_config?mode=diff now shows the YAML runtime configuration with all values that differ from the defaults. #3700
  • [ENHANCEMENT] Distributor: Enable downstream projects to wrap distributor push function and access the deserialized write requests berfore/after they are pushed. #3755
  • [ENHANCEMENT] Add flag -<prefix>.tls-server-name to require a specific server name instead of the hostname on the certificate. #3156
  • [ENHANCEMENT] Alertmanager: Remove a tenant's alertmanager instead of pausing it as we determine it is no longer needed. #3722
  • [ENHANCEMENT] Blocks storage: added more configuration options to S3 client. #3775
    • -blocks-storage.s3.tls-handshake-timeout: Maximum time to wait for a TLS handshake. 0 means no limit.
    • -blocks-storage.s3.expect-continue-timeout: The time to wait for a server's first response headers after fully writing the request headers if the request has an Expect header. 0 to send the request body immediately.
    • -blocks-storage.s3.max-idle-connections: Maximum number of idle (keep-alive) connections across all hosts. 0 means no limit.
    • -blocks-storage.s3.max-idle-connections-per-host: Maximum number of idle (keep-alive) connections to keep per-host. If 0, a built-in default value is used.
    • -blocks-storage.s3.max-connections-per-host: Maximum number of connections per host. 0 means no limit.
  • [ENHANCEMENT] Ingester: when tenant's TSDB is closed, Ingester now removes pushed metrics-metadata from memory, and removes metadata (cortex_ingester_memory_metadata, cortex_ingester_memory_metadata_created_total, cortex_ingester_memory_metadata_removed_total) and validation metrics (cortex_discarded_samples_total, cortex_discarded_metadata_total). #3782
  • [ENHANCEMENT] Distributor: cleanup metrics for inactive tenants. #3784
  • [ENHANCEMENT] Ingester: Have ingester to re-emit following TSDB metrics. #3800
    • cortex_ingester_tsdb_blocks_loaded
    • cortex_ingester_tsdb_reloads_total
    • cortex_ingester_tsdb_reloads_failures_total
    • cortex_ingester_tsdb_symbol_table_size_bytes
    • cortex_ingester_tsdb_storage_blocks_bytes
    • cortex_ingester_tsdb_time_retentions_total
  • [ENHANCEMENT] Querier: distribute workload across -store-gateway.sharding-ring.replication-factor store-gateway replicas when querying blocks and -store-gateway.sharding-enabled=true. #3824
  • [ENHANCEMENT] Distributor / HA Tracker: added cleanup of unused elected HA replicas from KV store. Added following metrics to monitor this process: #3809
    • cortex_ha_tracker_replicas_cleanup_started_total
    • cortex_ha_tracker_replicas_cleanup_marked_for_deletion_total
    • cortex_ha_tracker_replicas_cleanup_deleted_total
    • cortex_ha_tracker_replicas_cleanup_delete_failed_total
  • [ENHANCEMENT] Ruler now has new API endpoint /ruler/delete_tenant_config that can be used to delete all ruler groups for tenant. It is intended to be used by administrators who wish to clean up state after removed user. Note that this endpoint is enabled regardless of -experimental.ruler.enable-api. #3750 #3899
  • [ENHANCEMENT] Query-frontend, query-scheduler: cleanup metrics for inactive tenants. #3826
  • [ENHANCEMENT] Blocks storage: added -blocks-storage.s3.region support to S3 client configuration. #3811
  • [ENHANCEMENT] Distributor: Remove cached subrings for inactive users when using shuffle sharding. #3849
  • [ENHANCEMENT] Store-gateway: Reduced memory used to fetch chunks at query time. #3855
  • [ENHANCEMENT] Ingester: attempt to prevent idle compaction from happening in concurrent ingesters by introducing a 25% jitter to the configured idle timeout (-blocks-storage.tsdb.head-compaction-idle-timeout). #3850
  • [ENHANCEMENT] Compactor: cleanup local files for users that are no longer owned by compactor. #3851
  • [ENHANCEMENT] Store-gateway: close empty bucket stores, and delete leftover local files for tenats that no longer belong to store-gateway. #3853
  • [ENHANCEMENT] Store-gateway: added metrics to track partitioner behaviour. #3877
    • cortex_bucket_store_partitioner_requested_bytes_total
    • cortex_bucket_store_partitioner_requested_ranges_total
    • cortex_bucket_store_partitioner_expanded_bytes_total
    • cortex_bucket_store_partitioner_expanded_ranges_total
  • [ENHANCEMENT] Store-gateway: added metrics to monitor chunk buffer pool behaviour. #3880
    • cortex_bucket_store_chunk_pool_requested_bytes_total
    • cortex_bucket_store_chunk_pool_returned_bytes_total
  • [ENHANCEMENT] Alertmanager: load alertmanager configurations from object storage concurrently, and only load necessary configurations, speeding configuration synchronization process and executing fewer "GET object" operations to the storage when sharding is enabled. #3898
  • [ENHANCEMENT] Ingester (blocks storage): Ingester can now stream entire chunks instead of individual samples to the querier. At the moment this feature must be explicitly enabled either by using -ingester.stream-chunks-when-using-blocks flag or ingester_stream_chunks_when_using_blocks (boolean) field in runtime config file, but these configuration options are temporary and will be removed when feature is stable. #3889
  • [ENHANCEMENT] Alertmanager: New endpoint /multitenant_alertmanager/delete_tenant_config to delete configuration for tenant identified by X-Scope-OrgID header. This is an internal endpoint, available even if Alertmanager API is not enabled by using -experimental.alertmanager.enable-api. #3900
  • [BUGFIX] Cortex: Fixed issue where fatal errors and various log messages where not logged. #3778
  • [BUGFIX] HA Tracker: don't track as error in the cortex_kv_request_duration_seconds metric a CAS operation intentionally aborted. #3745
  • [BUGFIX] Querier / ruler: do not log "error removing stale clients" if the ring is empty. #3761
  • [BUGFIX] Store-gateway: fixed a panic caused by a race condition when the index-header lazy loading is enabled. #3775 #3789
  • [BUGFIX] Compactor: fixed "could not guess file size" log when uploading blocks deletion marks to the global location. #3807
  • [BUGFIX] Prevent panic at start if the http_prefix setting doesn't have a valid value. #3796
  • [BUGFIX] Memberlist: fixed panic caused by race condition in armon/go-metrics used by memberlist client. #3725
  • [BUGFIX] Querier: returning 422 (instead of 500) when query hits max_chunks_per_query limit with block storage. #3895
  • [BUGFIX] Alertmanager: Ensure that experimental /api/v1/alerts endpoints work when -http.prefix is empty. #3905
  • [BUGFIX] Chunk store: fix panic in inverted index when deleted fingerprint is no longer in the index. #3543

Don't miss a new cortex release

NewReleases is sending notifications on new releases.