github timescale/promscale 0.3.0

latest releases: 0.17.0, 0.16.0, 0.15.0...
3 years ago

This is a major release that contains important performance improvements, bug fixes, and new features. Users are advised to upgrade to this release to get performance benefits.

At a high level, this release contains

  • New high-availability system (experimental)
  • Major performance improvements
  • Improvement to our helm charts
  • Bug fixes
  • support for PostgreSQL 13

New high-availability system

This release contains a new system for handling Prometheus running in high-availability mode. The new HA system allows for easier deployments by allowing Prometheus and Promscale to scale independently and communicate through a load-balancer.

The new system works as follows: Prometheus is configured to attach external labels named cluster and __replica__ to the samples it sends. Each Prometheus replica of an HA Pair would send the same cluster label but a unique __replica__ label. Promscale would then elect a single Prometheus replica per cluster to write data at any point in time and would automatically failover if that Prometheus replica dies.

When deploying this new HA system, any Prometheus instance can send data to any Promscale instance via a load-balancer or similar mechanism. The correctness of the "replica election" does not depend on any mapping between Prometheus and Promscale nodes and is correct no matter the number of Prometheus or Promscale instances. This new system is enabled on the Promscale side with the CLI flag -enable-ha=true and relies on the Prometheus instance being configured with cluster and __replica__ external labels.

For this release, we are marking this new feature as experimental.

Performance improvements

  • Consolidate the series caches (saves 33% of memory)
  • Improve series creation and fetching performance (Improves series creation speed by >2x)
  • Adjust PGX PostgreSQL statement cache to grow proportionally with the number of metrics
  • Make the series cache grow automatically up to a memory limit.
  • Optimize memory usage by samples
  • Fix memory leak

Helm Chart Improvement

  • Added helm chart support to configure nodeSelector & tolerations for Promscale & Maintenance cronjob.
  • Updated Promscale helm chart to support the latest 0.8.2 TimescaleDB-single helm chart.
  • Changed Promscale deployment upgradeStartegy from RollingUpgrade to Recreate. As Promscale expects no Promscale to be connected to DB during the upgrade strategy.

Bug Fixes

  • Fix handling of UTF-8 NUL chars
  • Prevent crash when client does not send Bearer token when expected

Misc

  • New SQL API for deleting an entire metric
  • Updates to PromQL engine to support @ modifiers and negative offsets
  • Support for PGBouncer via the -simple-protocol=true flag
  • Add support for PostgreSQL 13

Better observability of Promscale itself

We've made several improvements to the observability of Promscale. We've added the following metrics (all with the promscale_ prefix):

  • PGX statement cache related:statement_cache_elements_stored, statement_cache_per_connection_capacity, statement_cache_enabled,
  • Metrics to monitor our caches: metric_name_cache_evictions_total, label_cache_evictions_total, series_cache_evictions_total, statement_cache_elements_stored.
  • Metrics to monitor our data flow: metric_batcher_channel_cap, metric_batcher_channel_len, copier_channel_cap, copier_channel_len
  • Metrics to monitor our batching: metric_batcher_flush_series, copier_inserts_per_batch , copier_rows_per_batch, copier_rows_per_insert,
  • Metrics to monitor DB performance: copier_insert_duration_seconds

Note the above list includes some metrics that were renamed: inserts_per_batch->copier_inserts_per_batch, rows_per_batch-> copier_rows_per_batch, and db_batch_insert_duration_seconds->copier_insert_duration_seconds

Many of these metrics are useful for debugging performance issues as they occur, but some should be monitored and alerted on.
We suggest the following alerts:

  • The rate of metric_name_cache_evictions_total, label_cache_evictions_total, and series_cache_evictions_total should be low compared to metric_name_cache_capacity, label_cache_capacity, and series_cache_capacity, respectively.
  • Channels should not be full. So the ratio of copier_channel_cap/copier_channel_len should be less than one and the 99th percentile of metric_batcher_channel_len should be less than metric_batcher_channel_cap

New CLI flags:

  • memory-target - the amount of memory Promscale is allowed to use (defaults to 80% of auto-detected system memory).
  • series-cache-initial-size - limit to the size of the series cache at startup (default 250000, should be equal to the number of active series).
  • series-cache-max-bytes - Maximum size the series cache can grow to (defaults to 50% of the memory target).
  • db-statement-cache - allows users to disable using the PGX statement cache (needed if using PGBouncer)
  • enable-ha - enables the new High-Availability system (experimental, see above)

Notes for people upgrading from 0.2.0 and before

Three new CLI Flags were added to control the size of the series cache (memory-target, series-cache-initial-size and series-cache-max-bytes). The default values should automatically adjust the series cache size to the right value. However, we suggest existing users set series-cache-initial-size to an overapproximation of the number of active series they have.

Notes for people upgrading timescaleDB 1.x -> 2.x

You should run SELECT remove_compression_policy(format('prom_data.%I', table_name), if_exists=>true) FROM _prom_catalog.metric; after the upgrade. This cleans up the old way of running compression jobs, compression will still work (just in the new way).

Prom-Migrator

Prom-Migrator has a bug fix to resolve an issue with writer-auth flag handling

Dependencies

This works with the following dependencies

  • Postgres: 12.x, 13.x
  • TimescaleDB: >=1.7.3 <2.0.0 (PG 12), 2.x.x (PG12 & PG13)
  • Promscale Extension: 0.1.x

Thanks

@SiddiqueAhmad and @0xflotus for PRs improving our docs
@grzesuav for a fix for the writer-auth flag in prom-migrator

Changelog

a114e89 Add "negative-offset" in list of disabled features.
0f7c8f2 Add PG13 e2e CI tests + Fixes
3025697 Add a memory-target CLI option
09a36fa Add benchmarks, DecodeBinary() for pgsafetype and update sql test files.
72ad82f Add ci debug output
e8597d8 Add error handling to env variable parsing
fa8d182 Add flag to control series cache memory size
b937229 Add header in safe pgtype files.
5444a12 Add observability metrics
18cfd52 Add option to select protocol used by db driver.
6baff86 Add pprof profiles
c01ed35 Add series id generation methods to micro benchmarks
d500521 Add size tracking to clockcache
a082c2a Add support function test
50e0f8a Add tests to verify multiple JSON payloads support.
5765cfb Add tolerations
f8bc479 Add utility library to get the total system memory.
731f747 Adds support for handling null chars in Prom label/label-value in runtime.
ec71ca6 Adjust sql update scripts
b7be9fb Apply suggestions from code review
78dc0ee Change default series cache size
7d38401 Create metrics for state of statement cache
55a369d Disable maintenance cronjob by defaultn as we expect all users to use >= TimescaleDB 2.0 and Also added an option to configure podAffinity & podAntiAffinity rules for Promscale deployment & Maintenance cronjob.
df9918f Docs for resource usage
8e86fd1 Document remote-write config params for most use-cases.
1276897 Fix deadlock in label creation
da00822 Fix end-to-end tests on mac
1d2817d Fix error handling and add checks
f3481db Fix memory leak in Protobuf
ed739c5 Fix misleading failed samples metric calculation.
b84d263 Fix potential deadlock in series creation
ac884d6 Fix tests for new series fetching method + cleanup
b091cb8 Fix tests to support PG13
4080467 Fix: enable optional features in PromQL engine were not applied
48fa29a Fixing testhelpers test to correctly close
f86a2ae GSoC 2021 idealist.
11fda67 HA based on external labels and data time
b4fa9e6 Implement pgutf8str.TextArray as underlying type in LabelList.
ae8b2f9 Improve observability of evictions
641c491 Improve series creation and fetching
5de3a5c Improve statement cache state log
31d8717 Improve workflow ci files
740abc1 Introduce method of growing the series cache
a4f1a38 Make Series concurrency safe
ba6ae38 Make the series cache a clockcache
3032915 Make the series endpoint return sorted series
b2d89ac Misc small refactoring
b75fd18 Optimize memory usage: samples
9c58625 Optimize memory: pass pointers to Inserter channel
fff4e87 Optimize memory: unset series labels and names when setting id
0370671 Prepare for the 0.3.0 release
e7f50e2 Prepare for the next development cycle
724c48f Prepare for the next development cycle
8dfea40 Prevent crash with client does not send token when expected.
d590231 Refactor PercentageBytes to use enum
feb8b48 Refactor and rename Labels to Series
d6a7240 Refactor: Cleanup SampleBatch and pending buffer api
6f0ff53 Remove a layer of caching for series
f104830 Remove unsused SeriesCache interface
90d07c1 Rename Labels to Series
1563e0c Rename block -> slab in prom-migrator.
6429143 Rename unset ids to invalid ids
c93499d Respond to PR comments
ac8b6b6 SQL api for deletion of metric.
15ea4a7 Separate out limits CLI config from validation
02dc562 Set statement cache cap and enabled once
dad1c6f Shut down refreshSync cleanly
b442443 Small PR fixes
c4121dd Statement cache depends on metrics cache
4330f6c Switch to using a function for data inserts
0e83020 Sync call TryChangeLeader if follower > lease
3a7183c Sync with db if non-leader is ahead of lease
97f0473 Try to change leader if maxt outside lease
3024287 Update PromQL till Feb 20.
ff50c60 Update Promb till Feb 8.
954d02e Update accepted PostgreSQL version
94d8c0a Update accepted TS version
517e46b Update drop test to work with series cache
045dc79 Update ha tests to consider sync tryChangeLeader
8f3ab82 Update pkg/ha/ha_parser_test.go
4b6d561 Update prom-migrator readme
65d2242 Update promql till Match 19.
8e08b88 Update sql_schema.md
ee0d46f Update testcontainers version
9d53d35 Update upgrade_test arch to be used in other pkgs.
4f9938a Use main Prometheus branch for PromQL comparision tests.
0e77fa7 add more tests
4bc3061 add nodeSelector & tolerations fields to maintenance cronjob
bd555c9 add support to configure resource requests for maintenance cronjob
5183b55 fix: #553 - WriterAuth are ignored
0c47916 fix: small error
ebe43d6 refactor promscale helmchart to work with latest TimescaleDB single helm chart
b32f420 update Promscale deployment upgrade strategy to Recreate as expect no Promscale pod to be connected to TimescaleDB during the upgrade.

Docker images

  • docker pull timescale/promscale:0.3.0
  • docker pull timescale/promscale:0.3
  • docker pull timescale/promscale:latest

Don't miss a new promscale release

NewReleases is sending notifications on new releases.