Overview
This release makes it easier than ever to monitor Kubernetes changes and remediate Prometheus alerts automatically.
Conceptually, every Robusta automation has three parts:
- Triggers that identify a problem, like a crashing pod
- Actions that gather data about the problem (e.g. fetch logs) or fix it automatically (e.g. restart a pod)
- Sinks that send notifications to Slack, MS Teams, and other destinations
This release focuses on adding new triggers and actions. In the coming weeks, we will focus on adding new sinks as well.
What's New
Run Kubernetes jobs in response to alerts
You can now create a Kubernetes job whenever a specific Prometheus alert fires. After the job is created, you will receive a notification like the following:
Enrich OOM Kills with extra data
Jump start your investigation of OOM Kills with extra data right in your messaging app:
By default, this also sends graphs of memory usage for easier troubleshooting:
Finally, we've added a new Robusta trigger on_pod_oom_killed
for custom automations. For example:
customPlaybooks:
- triggers:
- on_pod_oom_killed: {}
actions:
- pod_graph_enricher:
resource_type: Memory
display_limits: true
Automate the response to failed Kubernetes jobs
This implements a widely requested feature - a new trigger for failing Kubernetes jobs. You can use this to notify whenever specific jobs fail or to take automated actions.
customPlaybooks:
- triggers:
- on_job_failure:
namespace_prefix: robusta
actions:
- create_finding:
title: "Job $name on namespace $namespace failed"
aggregation_key: "Job Failure"
- job_events_enricher: { }
Above you can also see the new create_finding
action. This can be used to customize the message for Robusta notifications.
Launch self-hosting beta
The Robusta SaaS platform is now available for self-hosting via our commercial plans. Contact support@robusta.dev if you're interested.
As always, the Robusta open source can be used without the SaaS platform, in which case everything already runs in-cluster.
Most users should continue to use the cloud version of the Robusta UI instead of self hosting.
Support for additional community requests
We've added a --dry-run
flag to robusta playbooks trigger
in response to a request by Subramanyeswara Bhavirisetty
We've also added support for running debug pods as specific service accounts in response to a request by @SamAlex0808
Friendly reminder: we love hearing from users! Let us know what you like and what we can improve.
Breaking changes
This isn't new, but Robusta can fetch and render graphs from Grafana in response to events in your cluster. For example, you can send a report to your messaging app 15 minutes after a new deployment is rolled out with important graphs.
Supporting this feature adds some memory and cpu overhead even when unused. Therefore, we've changed it to be disabled by default.
If you use this feature, simply set grafanaRenderer.enableContainer: true
in your Helm values.
Adopters.md
We will soon be adding an ADOPTERS.md file listing Robusta users. Let us know if we can list your company there!
Additional Changes
- update kubewatch image - change cluster role api family by @arikalon1 in #417
- simple documentation fixes after user testing by @Sheeproid in #412
- Enhancement of docs by @Shubh28698 in #406
- Fix babysitter messages for creations and deletions by @aantn in #410
- Minor fixes for sink errors by @aantn in #408
- if the current state isnt oomkilled send the correct state by @Avi-Robusta in #413
- 2 small fixes: by @arikalon1 in #402
- Table heading uniformity by @Shubh28698 in #420
- better explaining sinks and Prometheus integration by @Sheeproid in #422
- use get_pod with selector to discover robusta runner. by @RoiGlinik in #404
- allow findings without kubernetes subject to add namespace filter, support statefulsets. by @RoiGlinik in #425
- adding a motion graphics triggers actions sinks gif by @Sheeproid in #426
- can generate auth token with a signing key in env var by @Sheeproid in #429
- fail robusta installation when no sinks are defined by @Avi-Robusta in #434
Full Changelog: 0.9.17...0.10.0