New Features & Enhancements
-
Moves the Litmus Portal to beta-2 phase with the following improvements:
- Ability to disable workflow schedules
- Support for configuration of private Git repositories as a source for experiments & predefined workflows (private MyHub)
- Allows the full set of CRUD operations on the embedded ChaosHub/MyHub
- Improves the chaos visualization via horizontal/vertical workflow views and proper formatting of logs for the workflow nodes.
-
Enhances the ChaosExperiment CRD to take HostPath Volume Type input.
-
Removes the limitation that only a single workload (amongst those sharing the labels) can be annotated for chaos.
-
Enhances the httpProbe to perform POST operations with payload described in the ChaosEngine or via a file mounted as a configmap.
-
Simplifies node resource chaos experiments to accept resources in units (mebibytes) along with relative percentage inputs.
-
Makes the termination mode configurable for the container-kill experiment (defaults to SIGKILL)
-
Adds more details to experiment logs around annotated workloads & filtered pod targets
-
Improves the disk-fill chaos experiment to use the helper pod approach for injection instead of running a dummy pod with a sleep command into which multiple exec operations occur.
-
Additional unit tests in the chaos-operator & chaos-runner repos.
-
Improves e2e tests (PRs/Commits) (pod chaos with combinations of pods_affected_perc & sequence env, annotation on multiple workloads etc.,) in the litmus-go repo
-
Updates the litmus-sdk based on recent changes to experiment templates
Major Bug Fixes
-
Ensures that different helper pods within an experiment instance are labeled with unique values (for fixed keys) in order to query them for status. Without this, these helper pods were being filtered by common labels resulting in incorrect validation. This is more so when multiple instances of the same experiment are executed in parallel.
-
Reflects the correct verdict of the experiment upon failure and abort, along with improved events in the Kafka & Cassandra chaos experiments.
-
Ensures smooth re-run of network chaos on a target with residual tc rule from the previous instance of chaos injection (
RTNETLINK answers: File exists
) -
Fixes the console spamming log messages on chaos-exporter which were seen until the ChaosResult/Engine resources were created.
Major Known Issues & Limitations
Issue:
Forced removal of the experiment helper pods (where applicable: notably network chaos experiments) either manually or due to Kubernetes eviction can render the chaos revert operation at the end of the chaos duration a failure/ a non-event. This will cause the application under test (AUT) to continue being subjected to chaos unless manually recovered.
Workaround:
With experiment pod logs it can be deciphered that the helper operations have failed. In which case, the AUT pod(s) can be deleted so they can be rescheduled again (this is applicable only to those applications deployed as a higher-level controller such as deployment/statefulset/daemonset, etc.,) with a new network namespace.
Fix:
This is being actively worked on (retry mechanism for chaos revert initiated in case of failed/missing helper pods) and should be available in a subsequent release.
Issue:
The pod-cpu-hog & pod-memory-hog experiments that run in the pod-exec mode (which is typically used when the users don’t want to mount runtime’s socket files on their pods) using the default lib can tend to fail - in spite of chaos being injected successfully - due to the unavailability of certain default utils in the target’s image that is used for detecting the chaos process and killing them/reverting chaos at the end of the chaos duration.
Workaround:
Users can identify the necessary commands to identify and kill the chaos processes and pass them to the experiment via env variable CHAOS_KILL_COMMAND
Alternatively, then can make use of the pumba chaoslib that uses external containers with SYS_ADMIN docker capability to inject/revert the chaos, while mounting the runtime socket file. Note that this is supported only on docker at this point.
Fix:
This is being actively worked on (native litmus chaoslib that can inject stress processes w/o exec requirement for docker/containerd/crio) and should be available in a subsequent release.
Installation
kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator-v1.13.0.yaml
Verify your installation
-
Verify if the chaos operator is running
kubectl get pods -n litmus
-
Verify if chaos CRDs are installed
kubectl get crds | grep chaos
For more details refer to the documentation at Docs