New Features & Enhancements
-
Introduces upgraded pod-cpu-hog & pod-memory-hog experiments that inject stress-ng based chaos stressors into target containers pid namespace (non-exec model).
-
Supports multi-arch images for chaos-scheduler controller
-
Supports CIDR apart from destination IPs/hostnames in the network chaos experiments
-
Refactors the litmus-python repository structure to match the litmus-go & litmus-ansible repos. Introduces a sample python-based pod-delete experiment with the same flow/constructs as its go-equivalent to help establish a common flow for future additions. Also adds a BYOC folder/category to hold non-litmus native experiment patterns.
-
Refactors the litmus-ansible repo to remove the stale experiments (which have been migrated and improved in litmus-go). Retains (improves) samples to help establish a common flow for future additions
-
Adds GCP chaos experiments (GCP VM stop, GPD detach) in technical-preview mode
Major Bug Fixes
-
Fixes erroneous logs in the chaos-operator seen while attempting to remove finalizer on chaosengine
-
Fixes a condition where the chaos revert information is present in both annotations as well as the status of chaosresult CR (the inject/revert status is typically maintained/updated as an annotation on the chaosresult before it is updated into the status and cleared/removed from annotations)
-
Removes hardcoded experiment job entrypoint, instead of picking from the ChaosExperiment CR’s
.spec.definition.command
-
Fixes a scheduler bug that interprets a minChaosInterval mentioned in hours (ex: 1h) in minutes
-
Improves the scheduler reconcile to stop flooding/logging every “reconcile” seconds irrespective of the minChaosInterval
-
Enables the scheduler to start off with the chaos injection immediately upon application of the ChaosSchedule CR without waiting for the first installment of minChaosInterval period - in repeat mode with only the minChaosInterval specified
-
Handles edge/boundary conditions where chaos
StartTime
is behindCreationTimeStamp
of ChaosSchedule OR next iteration of chaos as per minChaosInterval is beyond the EndTime -
Adds a check to ignore chaos pods (operator, runner, experiment/helper/probe pods) and blacklist them from being chaos candidates (esp. needed when appinfo.applabel is configured with exclusion patterns such as:
!keys
OR<key> notin <value>
) -
Removes hostIPC,
hostNetwork
permissions for pod stress chaos experiments -
Fixes an incorrect env key for TOTAL_CHAOS_DURATION in pod-dns experiments
-
Fixes a regression introduced in 1.13.6 wherein the experiment expected the parent workloads (deployment, statefulset et al) to carry labels specified in
appinfo.applabel
, apart from just the pods even when.spec.annotationCheck
was set to false in the ChaosEngine. Prior to this, the parent workloads needed to have the label only when.spec.annotationCheck
was set to true. This has been re-corrected as per earlier expectations.
Limitations
-
Chaos abort (via .spec.engineState set to stop OR via chaosengine deletion) operation is known to have an issue with the namespace scoped chaos-operator in 1.13.8, i.e., an operator running with WATCH_NAMESPACE env set to a specific value and using role permissions. In such cases, the finalizer on the ChaosEngine needs to be removed manually and the resource deleted to ensure the operator functions properly.
This is not needed/necessary for cluster scoped operators (which is the default mode of usage)(where WATCH_NAMESPACE env is set to empty string to cover all ns & leverages clusterrole permissions.)
The fix for correcting the behavior of namespace scoped operators will be added in the next patch.
Installation
kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator-v1.13.8.yaml
Verify your installation
-
Verify if the chaos operator is running
kubectl get pods -n litmus
-
Verify if chaos CRDs are installed
kubectl get crds | grep chaos
For more details refer to the documentation at Docs