In this blog, we’ll take a look at how to do a GCP IAM integration for LitmusChaos using Workload Identity for executing the GCP experiments in a keyless manner when using the Google Kubernetes Engine (GKE) as the execution plane.
To execute LitmusChaos GCP experiments, one needs to authenticate with GCP using a service account before trying to access the target resources. Usually, you have only one way of providing the service account credentials to the experiment, using a service account key, but if you’re using a GKE cluster you have a keyless medium of authentication as well. Therefore you have two ways of providing the service account credentials to your GKE cluster:
A Google API request can be made using a GCP IAM service account, which is an identity that an application uses to make calls to Google APIs. You might create individual IAM service accounts for the users who execute GCP experiments to enforce role-based access control, then download and save the keys as a Kubernetes secret that you manually rotate. Not only is this time-consuming, but service account keys only last ten years (or until you manually rotate them). An unaccounted-for key could give an attacker extended access in the event of a breach or compromise. Using service account keys as secrets is not an optimal way of authenticating GKE workloads due to this potential blind spot and the management cost of key inventory and rotation.
Workload Identity allows you to restrict the possible “blast radius” of a breach or compromise while enforcing the principle of least privilege across your environment. It accomplishes this by automating workload authentication best practices, eliminating the need for workarounds, and making it simple to implement recommended security best practices.
You can enable Workload Identity on clusters and node pools using the Google Cloud CLI or the Google Cloud Console. Workload Identity must be enabled at the cluster level before you can enable Workload Identity on node pools. Workload Identity can be enabled for an existing cluster as well as a new cluster.
To enable Workload Identity on a new cluster using the console, choose to create a GKE cluster, and aside from the usual configuration, simply enable the Workload Identity under security.
You can also use the gcloud
tool to create the Kubernetes cluster with Workload Identity enabled using the following command:
gcloud container clusters create CLUSTER_NAME \ --region=COMPUTE_REGION \ --workload-pool=PROJECT_ID.svc.id.goog
Replace the following:
CLUSTER_NAME
: the name of your new cluster.COMPUTE_REGION
: the Compute Engine region of your cluster. For zonal clusters, use --zone=COMPUTE_ZONE
.PROJECT_ID
: your Google Cloud project ID.You can enable Workload Identity on an existing Standard cluster by using the gcloud
CLI or the Cloud Console. Existing node pools are unaffected, but any new node pools in the cluster use Workload Identity by the modification of the same setting as shown above in the Cloud Console. Alternatively, you can use the following command to achieve it:
gcloud container clusters update CLUSTER_NAME \ --region=COMPUTE_REGION \ --workload-pool=PROJECT_ID.svc.id.goog
Replace the following:
CLUSTER_NAME
: the name of your new cluster.COMPUTE_REGION
: the Compute Engine region of your cluster. For zonal clusters, use --zone=COMPUTE_ZONE
.PROJECT_ID
: your Google Cloud project ID.Assuming that you already have LitmusChaos installed in your GKE cluster as well as the Kubernetes service account you want to use for your GCP experiments, execute the following steps.
gcloud container clusters get-credentials CLUSTER_NAME
Replace CLUSTER_NAME
with the name of your cluster that has Workload Identity enabled.
IAMServiceAccount
object for your selected service account. To create a new IAM service account using the gcloud
CLI, run the following command:gcloud iam service-accounts create GSA_NAME \ --project=GSA_PROJECT
Replace the following:
GSA_NAME
: the name of the new IAM service account.GSA_PROJECT
: the project ID of the Google Cloud project for your IAM service account.Alternatively, you can also use the Cloud Console UI to create a new GCP IAM Service Account.
gcloud projects add-iam-policy-binding PROJECT_ID \ --member "serviceAccount:GSA_NAME@GSA_PROJECT.iam.gserviceaccount.com" \ --role "ROLE_NAME"
Replace the following:
PROJECT_ID
: your Google Cloud project ID.GSA_NAME
: the name of your IAM service account.GSA_PROJECT
: the project ID of the Google Cloud project of your IAM service account.ROLE_NAME
: the IAM role to assign to your service account, like roles/spanner.viewer
.gcloud iam service-accounts add-iam-policy-binding GSA_NAME@GSA_PROJECT.iam.gserviceaccount.com \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:PROJECT_ID.svc.id.goog[NAMESPACE/KSA_NAME]"
Replace the following:
GSA_NAME
: the name of your IAM service account.GSA_PROJECT
: the project ID of the Google Cloud project of your IAM service account.KSA_NAME
: the name of the Kubernetes service account to be used for LitmusChaos GCP experiments. If you’re using ChaosCenter, then by default the litmus-admin
service account is used for all the experiments.NAMESPACE
: the namespace in which the Kubernetes service account to be used for LitmusChaos GCP experiments is present. If you’re using ChaosCenter, then by default litmus
is the namespace.kubectl annotate serviceaccount KSA_NAME \ --namespace NAMESPACE \ iam.gke.io/gcp-service-account=GSA_NAME@GSA_PROJECT.iam.gserviceaccount.com
Replace the following:
KSA_NAME
: the name of the Kubernetes service account to be used for LitmusChaos GCP experiments. If you’re using ChaosCenter, then by default the litmus-admin
service account is used for all the experiments.NAMESPACE
: the namespace in which the Kubernetes service account to be used for LitmusChaos GCP experiments is present. If you’re using ChaosCenter, then by default litmus
is the namespace.GSA_NAME
: the name of your IAM service account.GSA_PROJECT
: the project ID of the Google Cloud project of your IAM service account.When creating a new workflow with GCP experiments, edit the manifest YAML and add the following value to the ChaosEngine manifest field .spec.experiments[].spec.components.nodeSelector
to schedule the experiment pod on nodes that use Workload Identity:
iam.gke.io/gke-metadata-server-enabled: "true"
As an example, say you’re adding the GCP VM Instance Stop experiment, you’d make the following change:
Remove cloud-secret
at .spec.definition.secrets
in the ChaosExperiment manifest as we are not using a secret to provide our GCP Service Account credentials.
As an example, say you’re adding the GCP VM Instance Stop experiment, you’d remove the following:
Now you can run your GCP experiments with a keyless authentication provided by GCP using Workload Identity.
To conclude, we were able to set up a keyless authentication medium for the GCP experiments executing in a GKE that belongs to the target GCP environment. We saw how we can leverage Workload Identity for the GKE node pools to assign an IAM identity to our Kubernetes workloads, which was then used for the authentication of LitmusChaos GCP experiments.