kubecost/cost-analyzer-helm-chart v2.3.0-rc.3 on GitHub

V2.3.0-rc.3 Release Notes

Overview:

Version 2.3.0-rc.3 is a public release candidate for v2.3.0 which will be a 'production' release focused on targeting bug fixes and stability.
The upgrade will create a new aggregator database, which is used to quickly serve Kubecost metrics. This can take hours in large environments (large is considered $10k+ per day in Kubernetes costs). For these environments, it is possible to run a second "parallel" Kubecost primary environment. Please reach out to us via slack for assistance with this process.

Major:

Kubernetes Efficiency View - Easily breakdown the efficiency of your clusters and workloads over time.
- A link to this view has been added to the Cluster Efficiency card at the top of the Overview page.
Enterprise Integration (Postgres) - Add the ability to integrate kubecost data for enterprise customers to export kubecost data for usage with BI tools with Postgres.
Custom SMTP Server integration * Add the ability to integrate with custom SMTP servers for alerts, budgets, etc instead of using Kubecost’s default SMTP solution.
Anomaly Detection Enhancements
- Change how anomalies are detected and make the output more actionable. Anomalies are now detected on a rolling lookback window.
- Add a user defined threshold and minimum cost filter for detecting anomalies.
- Add anomaly detection for allocations as well as cloud costs.
- When navigating to the allocations or cloud costs page by clicking an anomaly, the anomalous entry will now be highlighted, and the lookback window marked.
All Business Tier users granted access to full Enterprise features during transition period.

Minor:

Cross-provider cur access - IRSA access for cloud cost integrations in kubecost with multiple providers.
Add the ability to end a free trial with the /expireProductTrial endpoint, allowing users to see what Free tier features are available even if they have begun a free trial.
- Added a button to the Settings page which can be used to call this endpoint.
Free trials will now automatically begin when the limit of 250 monitored cores is exceeded, rather than upon install. Trials can still be started manually via the settings page.
Additional diagnostic information is available in bug reports, and is used to more accurately indicate the state of data ingestion on startup. The Helm Chart version is now also visible from the settings page.
Updated the workload field label for Budgets.
Add support for specifying label values in Assets filters.
Reduced frequency of calls to a diagnostics endpoint.

Ingestion Fixes

A large focus has been placed on fixing data ingestion issues that we have seen in live environments during this release cycle. Below are a list of the focus areas

Orders of magnitude performance increase in ingestion and derivation of allocations, assets, cloud cost, network and containerstats data.
Fix an issue with the promotion of “write data” to “read data” where a race condition would sometimes cause a breakage.
Fix the error ‘internal list scan offset is out of range’ updates to the database.
Added many new diagnostics data points for assistance in troubleshooting and to help ensure a healthy flow of data during the initial phases of ingesting especially large datasets.
Add the ability to automatically grow the refresh interval on initial load of large datasets. This fixes an issue where on large datasets the ingestion process would be halted so the next ingestion process could begin. This feature will grow the refresh interval in the beginning to allow ingestion to get to a complete state before promoting and moving to the next ingestion cycle.
Add the ability to get a first cut of the data faster when a full reingest is being processed, enabling the nearest time-series data to be viewed while the historical data is still being ingested.
Fix the order of ingestion to make sure daily data, and the latest data is ingested first.

Fixes:

Fix Summary Allocation windowing inconsistencies between different accumulation options.
Fix the http 500 error in Cluster Sizing error when some nodes don’t have a valid asset type.
Fix an issue with savings api for clusters that contain Fargate nodes (nodes without a node type).
Fix the http 500 error in Assets Topline API when aggregating by label.
Fix an error filtering with the “contains”/”contains prefix”/”contains suffix” operators on custom labels.
Fix an issue with multiple reports with the same profile being created on pod restart when using v1 filters in the config map.
Fix an issue with Business tier licenses not being appropriately recognized post v2.0.
Fix an issue with /debug/orchestrator and /providerOptimization endpoints not being accessible when core count exceeds free tier and no valid license or trial.
Fix collections to use filters from teams/rbac configuration.
Fix an issue with duplicate budgets being created when creating a new budget and a budget with the same criteria already exists.
Fix an issue causing negative idle when multiple clusters share the same nodes.
Fix an issue where cloud costs and external costs processes could be initialized in cost-model even when the separate container is running for cloud cost and external costs causing messy logs and un-needed processing in the cost-model container.
Fix an issue with SMTP connection causing a panic.
Fix issues with Idle calculation with service/label aggregation.
Fix an issue with SAML configuration when query filter is empty and saml filter is not empty.
Fix an issue with request sizing missing valid parameters in query validation.
Fix an issue when aggregating by predefined label aliases (deployment, daemonset, etc).
Fix idle sharing of CPU, GPU, and RAM for Allocations API.
Fix aggregate by label when separating idle.
Fix an issue with drastic differences between assets visual representation in 2.x versus 1.108. This was due to seemingly duplicative data for certain time periods. Added cluster id to ingestion to aid in the de-duplication of this time-series data.
Fix an error in Assets View API when the end of the window is empty causing a Boundary Error.
Fix many noisy logs to be logged at the appropriate level or removed for ease of understanding state and troubleshooting.
Fix an issue when saving a scheduled report where the next run was sometimes not appropriately set.
Fix an error when enabling .Values.saml.enabled=true and .Values.readonly=true.
Fix an error where network insights would not be visible even when configuration was set to enable.
Fix an issue with Address Network cost reconciliation for Azure provider with an edge case for virtual machine scale sets.
Fix an issue where unallocated__idle was being returned in /savings/requestSizingv2.
Fix cluster sizing recommendation failures when nil objects detected.
Fix Assets API to accurately align topline and table data.
Fix Allocations to appropriately display idle for unallocated workloads.
Fix large inflated node prices before reconciliation occurs.
Updates Cloud Cost ingestion for GCP to fall back to resource.global_name when resource.name is null for determining ProviderID. This is particularly relevant for Cloud SQL, Cloud Storage, and Cloud Logging, which very often have null resource.name values, resulting in unallocated ProviderID values.
Fix an issue where some time series charts used the end of a time period for their x-axis instead of the start.
Fix an issue where the UI would attempt to show hourly data for External Costs on small, recent time windows. Hourly data is not collected for External Costs at this time.
Fix an issue where idle costs were not represented in the Namespaces table of the cluster inspect page.
Fix an issue where exporting the values.yaml entries for reports would sometimes format filters incorrectly.
Fix an issue on the cluster-inspect page in which the cost for a namespace did not account for the shareTenancyCost configuration.
Fix for sorting by cluster name on clusters Page.
Fix UI issues with the Automatic Request Rightsizing Action, where no loading indicator was shown while the action was registered, and failures were not indicated to the user.
Fix an issue in which filters were not properly set when drilling into an anomaly when using multi-aggregation.
Fix an issue where Business Tier users were being blocked from usage when exceeding 250 cores monitored. Business Tier is now effectively Enterprise.
Fix an issue where table header sort icon (up/down arrow) sometimes appeared to the left of the text instead of to the right.
Fix an issue where the “Add Cloud Provider” button would be absent from the Settings page unless at least one provider was already set up.
Fix an issue with deleting Asset reports.
Fix an issue where the diagnostics page would get stuck in Loading state when calls to the github API failed.
Fix an issue where table would reset to the first page when opening the Assets detail dialog.
Fix several issues with Cloud Provider creation.
Fix an issue where filters like Contains and Starts With, which are only useful when they do not exactly match the queried item, did not allow for free-form text: the user was forced to click on an autocomplete option.

Helm Changes:

#3456 Fix option for non-federated primary, allowing the primary cluster to serve the user interface without shipping the local cluster metrics to s3.
#3444 Add scrape configuration for aggregator telemetry metrics.
Move product key proxies to aggregator container.
#3440 Update cluster controller to resolve security issues.
#3437 Update kubecost-modeling to add many new anomaly detection features, bug fixes for anomaly detection, and forecasting, and update security issue in the base image.
#3414 #3378 #3376 Move ingestion and derivation diagnostic endpoints.
#3401 Add forecastingEnabled to the /productConfigs endpoint.
#3396 Add nginx configuration for diagnostics endpoints.
#3395 networkCosts off by default.
#3392 Change ImagPullPolicy to IfNotPresent.
#3383 #3367 Add the ability for Custom SMTP Server connection.
#3365 Add comments specifying that all parameters must be enclosed in quotes.
#3364 Update error message with more explicit instructions.
#3361 Add commonLabels to pods.
#3238 Add Kubecost Integrations (Postgres).
#3355 Grafana dashboard consistency update.
#3352 Fix Grafana dashboard for low disk utilization link, add dashboards for network services and aggregator metrics, and reorganize the dashboards to a dedicated folder.
#3318 Add consistency around how we are templating .Values.global.additionalLabels.
#3354 Updated link for ingres examples in values.yaml.
#3348 add check for cluster_id when using federated storage.
#3345 Allow multi-cluster prometheus.
#3341 Add a better warning when disabling prometheus.
#3326 Add missing pod labels for for forecasting deployment.
#3334 Add READ_ONLY to aggregator.
#3335 Allow users to set the daily retention limit in the cloud-cost container which controls cloud cost data retention.
#3324 Disable helm-rollout-restarter if cicd=true.
#3305 Adds security context for cloud-cost.
#3292 Allow users to add Aggregator extraVolumes and extraVolumeMounts.
#3311 Replace 2.0 in helm note with 2.x.
#3307 Add /providerOptimization proxy endpoint.

Helm Fixes:

#3384 Fix bug from upstream grafana-dashboards-json-configmap.yaml.
#3333 Remove cloud cost process and secrets from the cost-model container.
#3379 Restored Replicated testing.
#3356 Fix CPU Units.
#3325 Fix kubecostFrontend.deployMethod default value.
#3316 Fix rbac logic for teams.
#3313 Bump kubecost-modeling to fix xz vulnerabilities CVE-2024-3094.
#3309 Fix several deprecated environment variables.

Dependency Updates:

#3441 Bump Kiwigrid/k8s-sidecar from 1.27.1 to 1.27.2.
#3440 Bump clustercontroller to 0.16.1.
#3437 Bump kubecost-modeling to 1.1.12.
#3419 Bump prometheus-operator/prometheus-config-reloader from v0.73.2 to v0.74.0.
#3408 Bump prometheus/prometheus from v2.51.2 to 2.52.0.
#3420 Bump grafana/grafana from 10.4.2 to 10.4.3.
#3418 Bump kiwigrid/k8s-sidecar from 1.26.2 to 1.27.1.
#3404 Bump kubecost-modeling to v0.1.11.
#3389 Bump kiwigrid/k8s-sidecar from 1.26.1 to 1.26.2.
#3402 Bump kubecost-modleing to v0.1.10.
#3387 Bump kubecost-modeling to v0.1.9.
#3380 Bump kubecost-modeling to v0.1.7.
#3373 Bump prom/node-exporter from v1.7.0 to v1.8.0.
#3358 Bump prometheus-operator/prometheus-config-reloader from v0.73.1 to v0.73.2.
#3375 Bump helm/kind-action from 1.9.0 to 1.10.0.
#3328 Bump prom/pushgateway from v1.6.2 to v1.8.0.
#3340 Bump prometheus-operator/prometheus-config-reloader from v0.72.0 to v0.73.1.
#3346 Bump prometheus/prometheus from v2.51.1 to v2.51.2.
#3347 Bump grafana/grafana from 10.4.1 to 10.4.2.
#3313 Bump kubecost-modeling to fix xz vulnerabilities CVE-2024-3094.

kubecost/cost-analyzer-helm-chart v2.3.0-rc.3 V2.3.0-rc.3 on GitHub