New Features & Enhancements

Introduces experiment probes to enable declarative specification of entry/exit (success) criteria via the chaosengine. This release supports the Command, Kubernetes & HTTP probe types that can be configured in SoT (Start of Test), EoT (End of Test) & Edge execution modes. With this, users can reuse generic experiments to test a variety of app-specific/context-specific chaos scenarios.
Enhances the chaosresult status schema to include the ProbeSuccessPercentage score that gives an overview of the app/infra resilience to a specific chaos experiment run
Refines operational modes of litmus: Introduces namespaced operator support in helm charts to support multi-developer/shared cluster use-case with dedicated namespaces, such as in the Okteto Cloud, while updating the admin & standard mode functionality to watch engine resources in litmus & across namespaces respectively
Adds functionality to look for target applications in the chaosengine resource namespace if the target namespace is not explicitly specified.
Validates/prevents malformed application labels in the chaosengine
Improves the ChaosEngine status schema to hold more info (experiment pod names, runner names) that can aid other tools/abstractions running the experiment to derive/parse useful info for further reuse (logs extraction, for ex.)
Adds Microsoft Azure Kubernetes Service (AKS) as a supported platform for the generic experiment suite.
Adds a new chaos experiment to scale pods/test node autoscale functionality
Adds the libraries for the execution of AWS chaos using chaostoolkit, orchestrated by Litmus.
Adds support for the specification of host file mounts in chaos experiments
Allows setting polling intervals and timeouts for status checks via chaosengine to aid tuning execution for slower environments
Removes dependencies on multiple experiment “helper” (auxiliary) images and makes the litmus go-runner self-sufficient in handling the required chaos business logic. This eases maintenance, especially in the case of air-gapped environments / downstream projects that build the litmus components in their respective CI/CD pipelines.
Enhances the experiment to “fail fast” upon failed app checks in cases where containers are terminated
Upgrades the ansible-runner to use python3
Enhances the developer experience for litmus chaos experiments by using Okteto CLI to develop & test experiment business logic in-cluster over repeating image-build-job-run cycles
Updates the scaffold utils to generate the experiment bootstrap code based on the latest developments in the experiment structure.
Adds chaos-instrumented grafana dashboards for the sock-shop application along with details on setting up monitoring for chaos experiment runs.
Adds pre-defined/usable workflows for repeatable execution of node resource chaos in the chaos-charts repo
Pushes the technical preview / pre-alpha version of the litmus-portal (available on the master branch).
Refactors the litmus-e2e repo/code-structure to simplify the addition of new BDD tests (modularization, removal of bash utils, formatted errors, klog usage, scenario coverage parameters)
Adds additional stages in litmus-e2e GitLab pipelines to execute both the go-based & ansible-based chaos experiments
Improves github-actions based comment-triggered e2e runs with log details
Features a completely revamped & improved ChaosHub
Improves the project wiki with more information for users and developers (architecture docs, video tutorials, charters for the Litmus Special Interest Groups)

Major Bug Fixes

Patches the chaosengine with the right (‘stopped’) and fixes the event to provide the right reason in cases where app filtering is unsuccessful. This will allow a re-apply of the engine to re-trigger the application.
Adds a check to factor-in cordoned (SchedulingDisabled) status of nodes in kubelet & docker-service kill experiments.
Provides the tc_image used in network chaos experiments as an experiment tunable over hardcoding in order to support users with internal image registries
Decides experiment termination based on chaos container status over that of chaos pod objects to support operations in a service-mesh environment (istio, linkerd) where all pods (including chaos resources) are injected with sidecars. Without this, the experiment runs forever due to the proxy sidecars.
Sets the restart policy of the experiments jobs to Never over OnFailure to prevent repeated re-execution for certain experiment failure conditions.
Fixes the incorrect eventType for chaos events in cases of failures & skipped executions.
Fixes the go-based pod-cpu-hog & pod-memory-hog experiments to execute the chaos processes (commands) in the target container by passing them as a args to shell instance (/bin/sh -c) to account for targets which may run with different entrypoints.
Fixes permission issues on the infra helm chart resulting in failed metrics collection

Breaking Changes

Stops support for the ansible-runner/executor (EoL) (Not to be confused with the ansible-based chaos experiments)
Removes the following repositories:
- litmuschaos/pages: The operator manifests are available over gh-pages sourced out of litmuschaos/litmus
- litmuschaos/chaos-helm: The experiments helm chart is also into the litmus-helm repo.
- litmuschaos/community: The demo procedures & community info are now available within the litmus-demo & the litmus repo respectively.

Installation

kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator-v1.7.0.yaml

Verify your installation

Verify if the chaos operator is running
kubectl get pods -n litmus
Verify if chaos CRDs are installed
kubectl get crds | grep chaos

For more details refer to the documentation at Docs

helm/litmuschaos/litmus 1.7.0 on Artifact Hub

New Features & Enhancements

Major Bug Fixes

Breaking Changes

Installation

Verify your installation

helm/litmuschaos/litmus 1.7.0
on Artifact Hub