In this beginner-friendly blog, we’ll be going through the GCP VM Disk Loss experiment for LitmusChaos. GCP VM Disk Loss experiment causes detachment of a non-boot persistent storage disk from a GCP VM instance for a specified duration of time and later re-attaches the disk to its respective VM instance. The broad objective of this experiment is to extend the principles of cloud-native chaos engineering to non-Kubernetes targets while ensuring resiliency for all kinds of targets, be it Kubernetes or non-Kubernetes ones, as a part of a single chaos workflow for the entirety of a business.
At the time of writing this blog, the experiment is available only as a technical preview in the chaos hub, but in the upcoming releases, the experiment will surely become an integral part of the chaos hub. That being said, we can still access and execute the experiment without any problem, as I am about to show you in this blog.
Before we begin with the steps of the experiment, let’s check the pre-requisites for performing this experiment:
Browse and log in to your Litmus portal. You should be on the dashboard.
Select ChaosHubs. Here you’d be able to see the default Chaos Hub.
Choose to Edit the default Chaos Hub and instead of the
Click Submit Now. Now you’d be able to access all the experiments, even those under the technical preview. To confirm that the experiments have been added successfully, click on Chaos Hub.
You should see the GCP Experiments listed here. Now we are all set to begin the steps of the experiment.
We’d be using the experiment docs to help us with a few steps.
In this demo, we will inject chaos into two non-boot persistent storage
test-disk-1, attached to the VM instances
test-instance-1 respectively, the disks are
located in the zones
us-central1-b respectively, and
belong to the GCP project “Litmus GCP Instance Delete” with the ID of
Please notice that both the disks are initially attached to their
respective VM instances, before the injection of the chaos. Now that we
have our disks ready, we can set up our experiment. Before scheduling the
chaos experiment, we need to make the GCP Service Account credentials
available to Litmus, so that the instances can be shut down and later
started as part of the experiment. To do that, we’ll make a Kubernetes
secret.yaml as follows:
apiVersion: v1 kind: Secret metadata: name: cloud-secret type: Opaque stringData: type: "service_account" project_id: "litmus-gcp-instance-delete" private_key_id: "9e0jacc5e0abb74f3426df51c0ca5065904c6beb" private_key: -----BEGIN PRIVATE KEY-----\nMIIEvgIBADANBgkqhkiG9w0BAQEJAASCBKgwggSkAgEAAoIBAQD1JSTjKKN5CCGF\nUsWnaCHfFOReX6wDT+toYz065z5t4cYq3wb/RUGJz4q6n0Z> client_email: "email@example.com" client_id: "123476663820197864518297" auth_uri: "https://accounts.google.com/o/oauth2/auth" token_uri: "https://oauth2.googleapis.com/token" auth_provider_x509_cert_url: "https://www.googleapis.com/oauth2/v1/certs" client_x509_cert_url: "https://www.googleapis.com/robot/v1/metadata/x509/experiment-demo%40litmus-gcp-instance-delete.iam.gserviceaccount.com"
The format of this secret is also available in the experiment docs. Make
sure the name of the secret is
cloud-secret and replace the respective
fields of the secret with your own service account credentials. Once done,
apply the secret in the
litmus namespace using the command:
kubectl apply -f secret.yaml -n litmus
Once the secret is applied, we’re all set to schedule our experiment from the Litmus portal. In Dashboard, click on the Schedule a Workflow button. In the workflow creation page, choose the self-agent and click Next.
In the Choose a Workflow page, select “Create a new workflow using the experiments from MyHub” and select Chaos Hub in the dropdown. Then click Next.
In the Workflow Settings page, fill in the workflow name and description of your choice. Click Next.
In the Tune Workflow page, click on “Add a new experiment” and choose
Click Done. Notice that the experiment has been added to the experiment graph diagram. Now click on “Edit YAML”. Here we will edit the workflow manifest to specify the experiment resource details.
Scroll down to the manifest of the
Notice that the name of the secret that we had previously created is being
passed to the
ChaosExperiment to be mounted at the path
Scroll further down and similarly fill in the relevant experiment details
in the manifest of the
ChaosEngine as follows:
Please take note that the zone for each target disk is to be mentioned in
DISK_ZONES in the same order of the
DISK_VOLUME_NAMES. Similarly, the
device name for each target disk is to be mentioned in
the same order of the
DISK_VOLUME_NAMES. If you like, feel free to
modify the parameters of the experiment such as the
TOTAL_CHAOS_DURATION, etc. As you would have noticed, some of the
experiment tunables are common for both the
ChaosExperiment, and the values of
ChaosExperiment get overridden by
that of the values of the
ChaosEngine if they differ in both the
manifests. Once you have made the changes, click Save Changes. We’ve now
specified all the experiment details and are ready to go to the next step.
In the Reliability Score, we will use the default score of 10. Click Next.
In Schedule, click Schedule Now. Click Next. On the Verify and Commit page verify all the details and once satisfied click on Finish. We’ve successfully scheduled our chaos experiment.
Click on Go to Workflow and choose the workflow that we just created. Here we can observe the different steps of the workflow execution including chaos experiment installation, chaos injection, and chaos revert.
You can also determine if the chaos injection has taken place and as a result, the disks have detached from their respective VM instances or not from the GCP Console.
We can also view the Table View for the experiment logs as the experiment proceeds through the various steps.
Once completed, the workflow graph should have executed all the steps successfully.
We have got a 100% Resiliency Score. We can also check the
verdict which should say the experiment has passed. The Probe Success
Percentage should be 100% as all our disks re-attached successfully post
Again you can check in the GCP console if the disks have been re-attached or not.
We can also perform post chaos analysis for the experiment results in the Analytics section.
In conclusion of this blog, we saw how we can perform the GCP VM Disk Loss chaos experiment using Litmus Chaos 2.0. This experiment is only one of the many experiments for the Non-Kubernetes experiments in LitmusChaos, including experiments for AWS, Azure, VMWare, and many more, which are targeted towards making Litmus an absolute Chaos Engineering toolset for every enterprise regardless of the technology stack used by them.
Come join me at the Litmus community to contribute your bit in developing
chaos engineering for everyone. To join the Litmus community:
Step 1: Join the Kubernetes slack using the following link:
Step 2: Join the
#litmus channel on the Kubernetes slack or use this
link after joining the Kubernetes slack:
Show your ❤️ with a ⭐ on our Github. To learn more about Litmus, check out the Litmus documentation. Thank you! 🙏