My LFX mentorship experience contributing to Custer API GCP

In my previous blog post, I shared how I got selected for the LFX mentorship. In this post, I am going to write about my experience contributing to Cluster API GCP.

Mentorship Project Description

The mentorship was about adding GPU support for CAPG. For Google Cloud Platform it is NVIDIA GPU that it supports as of now. So, We first started with planning our road map about what are the steps that are required for adding the GPU support. The first thing we decided to do is create a GPU driver-enabled OS image that can take advantage of the GPUs in the VM. For that, we created this PR. Here we mostly added packer config files so that it will create the OS image with NVIDIA GPU drivers.

The next thing that we did was to make changes in the CAPG API so that we can declare the fields that are required to create the VMs with GPU in the GCP. After that, we added the validations and webhooks for the new API changes so that incoming requests will be validated properly. Finally, we added the unit tests and end-to-end tests so that we have fully tested software in the main branch. Here is the PR we created in the CAPG repo that has all the changes mentioned above.

And after all our hard work we successfully created VMs with GPU with Cluster API in the GCP.

My experience overall

I never thought of doing LFX and contributing to such big projects a few months back. The only thing that kept me motivated and kept me contributing was the awesome community and the projects. In the beginning, to get familiar with the project, my mentors gave me the task to spin a normal Kubernetes managed cluster in the GCP using Cluster API and reading the documentation. Throughout the mentorship, all my mentors Dims, Richard, and Carlos helped me overcome all kinds of challenges to complete the task, and also they gave me the motivation and enthusiasm to push my boundaries and learn new things every day. This mentorship not only helped me to become a better developer in the Cloud Native technologies but also helped me a better thinker in terms of solving real-world engineering problems. In one word my overall experience with LFX mentorship is fabulous and wonderful. And last but not least all of the above would have been incomplete if I didn’t have my co-mentee Subhasmita.

Future Scope

After this project, I started taking other open source issues in the CAPG and also started contributing to CAPI as well. And I will keep contributing to the CNCF project in the future and hopefully, I will work on more such big and significant features in the future.

Becoming Kubernetes & Kubernetes SIG member

Another great thing that happened to me was that I recently became Kubernetes, Kubernetes-SIG member. Thanks to Carlos, Nabarun, Richard, Dims for giving me +1

Also, if you have any queries regarding Cluster API GCP or Cluster API, feel free to join the Kubernetes slack using the link: and then join the #cluster-api-gcp #cluster-api channel. And, also feel free to ping me @aniruddha on slack if you have any questions.


Check for Kubernetes deployment with client-go library

For the past couple of days, I have been tinkering with the client-go library. It provides the necessary interfaces and methods by which you can manipulate the Kubernetes cluster resources from your go code. After exploring for a while I started working on a side project that does some checking over deployment and if the deployment doesn’t have a certain environment variable it will delete the deployment other wise it will keep it as it is.


In this blog, I am not going to give idea about how to set up a go project.

First, create a directory named app and create another directory inside it called service. Now create a file named init.go inside the service directory.

package service

import (


// Initializes the kube config clientset
func Init() *kubernetes.Clientset {
	config, err := rest.InClusterConfig()
	if err != nil {
		kubeconfig := filepath.Join("home", "aniruddha", ".kube", "config")
		if envvar := os.Getenv("KUBECONFIG"); len(envvar) > 0 {
			kubeconfig = envvar

		config, err = clientcmd.BuildConfigFromFlags("", kubeconfig)
		if err != nil {
			log.Fatalf("kubeconfig can't be loaded: %v\n", err)

	clientset, err := kubernetes.NewForConfig(config)
	if err != nil {
		log.Fatalf("error getting config client: %v\n", err)

	return clientset

In the above code example, we call InClusterConfig first and that actually gives back the config object that contains a common attribute that can be passed to a Kubernetes client on initialization. If we couldn’t find the config we look for the Kube config in the default location in most of the Linux.

After we got the config now it’s time for initializing a client. We do it by NewForConfig method. It returns a clientset that contains the client’s resources for each group. Like the pods can be accessed by the corev1 group in the clientset struct.

Check for deployments

Create another directory under the app dir named client.

package client

import (


const (

// Check for Deployment and start a go routine if new deployment added
func (c *Client) CheckDeploymentEnv(ns string) {
	informerFactory := informers.NewSharedInformerFactory(c.C, 30*time.Second)

	deploymentInformer := informerFactory.Apps().V1().Deployments()
		AddFunc: func(obj interface{}) {
			log.Println("Deployment added. Let's start checking!")

			ch := make(chan error, 1)
			done := make(chan bool)

			go c.check(ns, ch, done)

			for {
				select {
				case err := <-ch:
					log.Fatalf("error checking envvar: %v", err)
				case <-done:
					break loop


Now in the CheckDeploymentEnv method, we first going to create the NewSharedInformerFactory which is going to give us back an interface that can be helpful to retrieve various resources from the local cache of the cluster. Then we can handle various events like add, update, delete, etc in the cluster and take action accordingly.

Then we add another function in the same file as above.

func (c *Client) check(namespace string, ch chan error, done chan bool) {
	deployments, err := ListDeploymentWithNamespace(namespace, c.C)
	if err != nil {
		ch <- fmt.Errorf("list deployment: %s", err.Error())

	for _, deployment := range deployments.Items {
		var envSet bool
		for _, cntr := range deployment.Spec.Template.Spec.Containers {
			for _, env := range cntr.Env {
				if env.Name == ENVNAME {
					log.Printf("Deployment name: %s has envvar. All set to go!", deployment.Name)
					envSet = true
		if !envSet {
			log.Printf("No envvar name %s - Deleting deployment with name %s\n", ENVNAME, deployment.Name)
			err = DeleteDeploymentWithNamespce(namespace, deployment.Name, c.C)
			if err != nil {
				ch <- err
	done <- true

Here we list the deployments(covered next) and for every deployment, we check for env variables and delete them if we found that the env variable is missing. And pass true to the done channel if everything is successful otherwise pass the error to the other channel.

Deployment Handler

Create another file named deployment.go in the client directory.

package client

import (

	v1 ""
	metav1 ""

// List deployment resource with the given namespace
func ListDeploymentWithNamespace(ns string, clientset *kubernetes.Clientset) (*v1.DeploymentList, error) {
	deployment, err := clientset.AppsV1().Deployments(ns).List(ctx, metav1.ListOptions{})
	if err != nil {
		return nil, err
	return deployment, nil

// Delete deployment resource with the given namespace
func DeleteDeploymentWithNamespce(ns, name string, clientset *kubernetes.Clientset) error {
	err := clientset.AppsV1().Deployments(ns).Delete(ctx, name, metav1.DeleteOptions{})
	if err != nil {
		if errors.IsNotFound(err) {
			log.Printf("Deployment don't exists with name %s\n", name)
			return nil
		} else {
			return fmt.Errorf("delete Deployment: %v", err)
	log.Printf("Deployment deleted with name: %v\n", name)

	return nil

Here we have two methods one for listing the deployments and another for deleting them. Here we directly get the resources from clients means we are querying them on the Kubernetes API server unlike previously from the local in-memory cache.

Now create another file name client.go in the client directory. and use the code below.

package client

import (


var (
	ctx = context.TODO()

type Client struct {
	C *kubernetes.Clientset

// Return a new Client
func NewClient() *Client {
	return &Client{}


package main

import (


func main() {
	var nameSpace string

	flag.StringVar(&nameSpace, "ns", "test-ns",
		"namespace name on which the checking is going to take place")

	log.Printf("Checking Pods for namespace %s\n", nameSpace)
	c := client.NewClient()
	c.C = service.Init()


Here we are just taking the namespace from the flag and calling all the necessary functions mentioned in the entire article.

Run the app in the Kubernetes cluster

In order to run the app in the cluster, we have to set up CusterRole & ClusterRoleBinding for the default service account for the pod.

kind: ClusterRole
  name: pod-namespace-clusterrole
  - apiGroups: ["apps"]
    resources: ["deployments"]
    verbs: ["list", "delete"]
kind: ClusterRoleBinding
  name: pod-namespace-clusterrolebinding
  - kind: ServiceAccount
    name: default
    namespace: default
  kind: ClusterRole
  name: pod-namespace-clusterrole

Then you have to build the project and make a docker image out of it using docker build, docker tag & docker push command. Then create a deployment YAML template mentioned below and apply that.

apiVersion: apps/v1
kind: Deployment
  creationTimestamp: null
    app: client
  name: client
  replicas: 1
      app: client
  strategy: {}
      creationTimestamp: null
        app: client
      - image: <YOUR DOCKER IMAGE>
        name: client-app
        resources: {}
status: {}

Here is my GitHub URL for the project –

You can find how to run the project in the README of the mentioned GitHub URL above.

What is RBAC in Kubernetes?

RBAC stands for Role Based Access Control. It allows us to define user privilege in the Kubernetes cluster that will restrict users from doing the unwanted operation. We describe access rights such as who is allowed to create, update, and delete resources.

Why do we need it?

  • To make the cluster more secure.
  • To scale our cluster to various development teams and avoid conflict between them.


In RBAC API there are main 4 types of objects –

  • Role – It’s used for namespace object constraints.
  • RoleBinding – Mapping the Role to the user.
  • ClusterRole – It’s used for the cluster-wide resource constraints.
  • CLusterRoleBinding – It’s used for mapping the ClusterRole to the user.


Now we are going to create the objects mentioned above and see how these all work.

ClusterRole & ClusterRoleBinding

First, we are going to create a service account

kubectl create serviceaccount bob

Now write the below two YAML files for the ClusterRole & ClusterRoleBinding-

kind: ClusterRole
  name: bob
  - apiGroups:
      - ''
      - pods
      - pods/status
      - namespace
      - deployments
      - get
      - list
      - watch
      - create
      - update
kind: ClusterRoleBinding
  name: bob-binding
  - kind: ServiceAccount
    name: bob
    namespace: default
  kind: ClusterRole
  name: bob

Here we first create a service account and we define a role that will be able to get, list, watch, create, and update the pods, deployments, and namespace.

Later we create a cluster role binding that will map the cluster role to the service account.

Role & RoleBinding

Let’s create a namespace first.

apiVersion: v1
kind: Namespace
  name: application
    name: alice

Then define the below Role and RoleBinding

kind: Role
  namespace: application
  name: alice
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "watch", "list"]
kind: RoleBinding
  name: alice-binding
  namespace: application
- kind: User
  name: alice
  kind: Role
  name: alice

Here we create a namespace and then define a role that will allow get, watch, and list operations on the pods to the Alice namespace.

Later we map the role binding to the namespace with role binding.

Create a managed cluster using Cluster API Provider for Google Cloud Platform (CAPG)

In the previous blog, I explained how to create and manage Kubernetes with cluster API locally with the help of docker infrastructure.

In this blog, I will explain how to create and manage the k8s with Cluster API in the google cloud.

Note – Throughout the blog, I will use Kubernetes version 1.22.9 and it is recommended to use the version of our OS image created by the image builder. You can check from kubernetes.json and use that.

Step 1 –

  • Create the kind cluster –
kind create cluster --image kindest/node:v1.22.9 --wait 5m

Step 2 –

Follow image builder for GCP steps and build an image.

Step 3 –

  • Export the following env variables – (reference)
export GCP_B64ENCODED_CREDENTIALS=$( cat /path/to/gcp-credentials.json | base64 | tr -d '\n' )

export GCP_REGION="us-east4"
export IMAGE_ID=projects/$GCP_PROJECT/global/images/<IMAGE ID>
export GCP_NODE_MACHINE_TYPE=n1-standard-2
export GCP_NETWORK_NAME=default
export CLUSTER_NAME=test

Step 4 –

setup the network in this example we are using the default network so we will create some router/nats for our workload cluster to have internet access.

gcloud compute routers create "${CLUSTER_NAME}-myrouter" --project="${GCP_PROJECT}" --region="${GCP_REGION}" --network="default"

gcloud compute routers nats create "${CLUSTER_NAME}-mynat" --project="${GCP_PROJECT}" --router-region="${GCP_REGION}" --router="${CLUSTER_NAME}-myrouter" --nat-all-subnet-ip-ranges --auto-allocate-nat-external-ips

Step 5 –

  • Initialize the infrastructure
clusterctl init --infrastructure gcp
  • Generate the workload cluster config and apply it
clusterctl generate cluster $CLUSTER_NAME --kubernetes-version v1.22.9 > workload-test.yaml

kubectl apply -f workload-test.yaml
  • View the cluster and its resources
$ clusterctl describe cluster $CLUSTER_NAME
NAME                                                               READY  SEVERITY  REASON                 SINCE  MESSAGE
/test                                                              False  Info      WaitingForKubeadmInit  5s
├─ClusterInfrastructure - GCPCluster/test
└─ControlPlane - KubeadmControlPlane/test-control-plane            False  Info      WaitingForKubeadmInit  5s
  └─Machine/test-control-plane-x57zs                               True                                    31s
    └─MachineInfrastructure - GCPMachine/test-control-plane-7xzw2
  • Check the status of the control plane
$ kubectl get kubeadmcontrolplane
test-control-plane   test                                           1                  1         1             2m9s   v1.22.9

Note – The controller plane won’t be ready until the next step when I install the CNI (Container Network Interface).

Step 6 –

  • Get the kubeconfig for the workload cluster
$ clusterctl get kubeconfig $CLUSTER_NAME > workload-test.kubeconfig
  • Apply the cni
kubectl --kubeconfig=./workload-test.kubeconfig \
  apply -f
  • Wait a bit and you should see this when getting the kubeadmcontrolplane
$ kubectl get kubeadmcontrolplane
test-control-plane   test      true          true                   1          1       1         0             6m33s   v1.22.9

$ kubectl get nodes --kubeconfig=./workload-test.kubeconfig
NAME                       STATUS   ROLES                  AGE   VERSION
test-control-plane-7xzw2   Ready    control-plane,master   62s   v1.22.9

Step 7 –

  • Edit the MachineDeployment in the workload-test.yaml it has 0 replicas add the replicas you want to have your nodes, in this case, we used 2. Apply the workload-test.yaml
$ kubectl apply -f workload-test.yaml
  • After a few minutes, you should see something like this –
$ clusterctl describe cluster $CLUSTER_NAME
NAME                                                               READY  SEVERITY  REASON  SINCE  MESSAGE
/test                                                              True                     15m
├─ClusterInfrastructure - GCPCluster/test
├─ControlPlane - KubeadmControlPlane/test-control-plane            True                     15m
│ └─Machine/test-control-plane-x57zs                               True                     19m
│   └─MachineInfrastructure - GCPMachine/test-control-plane-7xzw2
  └─MachineDeployment/test-md-0                                    True                     10m
    └─2 Machines...                                                True                     13m    See test-md-0-68bd55744b-qpk67, test-md-0-68bd55744b-tsgf6

$ kubectl get nodes --kubeconfig=./workload-test.kubeconfig
NAME                       STATUS   ROLES                  AGE   VERSION
test-control-plane-7xzw2   Ready    control-plane,master   21m   v1.22.9
test-md-0-b7766            Ready    <none>                 17m   v1.22.9
test-md-0-wsgpj            Ready    <none>                 17m   v1.22.9

Yaaa! Now we have a Kubernetes cluster in the GCP with 1 control pannel with 2 worker nodes.

Step 8 –

Delete what you have created –

$ kubectl delete cluster $CLUSTER_NAME

$ gcloud compute routers nats delete "${CLUSTER_NAME}-mynat" --project="${GCP_PROJECT}" \
    --router-region="${GCP_REGION}" --router="${CLUSTER_NAME}-myrouter"

$ gcloud compute routers delete "${CLUSTER_NAME}-myrouter" --project="${GCP_PROJECT}" \

$ kind delete cluster

Learning: Kubernetes – Deployments & StatefulSet


Deployments are the way we manage pods in k8s. We specify all possible information about the pods like which version image it is going to pick and how many replicas of the pod will be there.

  • Properties
    • The spec.selector specify which pod it needs to manage.
    • When we update a deployment, it first creates a new pod, deletes an old pod, and makes sure that 125% of the desired number of pods is available at any time.
  • Rollout to a Previous Version When rolling out to a previous version we just use – kubectl rollout undo deployment/nginx-deployment When rolling out to another previous version we use – kubectl rollout undo deployment/nginx-deployment --to-revision=2


Just like we manage the stateless applications with deployments we work with stateful applications with StatefulSet.

  • Properties
    • The StatefulSet cannot be created/deleted at the same time
    • can’t be accessed randomly
    • The replica set here is not identical.
    • Each pod gets a unique identifier in increasing order and these are required while rescheduling.
    • Each pod has its own physical store.
    • There is a master pod that is only allowed to change data.
    • All the slave pods sync with the master pod in order to achieve data consistency.
    • When a new pod joins the replica set it first clones all the data from one of the slave pods and after that starts to sync.
  • StatefulSets are valuable for applications that require one or more of the following.
    • Stable, unique network identifiers.
    • Stable, persistent storage.
    • Ordered, graceful deployment and scaling.
    • Ordered, automated rolling updates.
  • Data Persistence If a pod dies then all its data will be lost. So in order to counter this, we use persistent volume attached to every pod.
    • The storage has all the synchronized data with the pod’s state data.
    • When a pod gets replaced the persistent volume gets reattached to the pod and the state of the pod gets resumed.

Learning: Kubernetes – Persistent Volume & Persistent Volume Claim

Volume – Volume in Kubernetes can be thought of as a directory that can be accessed by containers in the pod. Volume helps persists the data even if the pod restarts.

  • PV
    • A Persistent Volume (PV) is a piece of storage in the cluster.
    • It is a cluster-level resource like a pod and doesn’t have any namespace.
    • It is been manually provisioned by an administrator, or dynamically provisioned by Kubernetes using a StorageClass.
  • PVC
    • A PersistentVolumeClaim (PVC) is a request for storage by a user that can be fulfilled by a PV.
    • Persistent Volumes and PersistentVolumeClaim are independent of Pod lifecycles and preserve data through restarting, rescheduling, and even deleting Pods.
  • Access Modes
    • ReadWriteOnce – It is used when we allow only one node to read & write on the volume. Multiple pods running on the same node can access the volume.
    • ReadOnlyMany – It is used when we allow read access to many pods.
    • ReadWriteMany – It is used when we allow read & write access to many nodes.
    • ReadWriteOncePod – It is used when we allow only one pod in a node for reading & writing.

Learning: Kubernetes – Container Runtime Interface & Garbage Collection

Container Runtime Interface

The Container Runtime Interface (CRI) is the primary protocol for the communication between the kubelet and Container Runtime.

Container Runtime – It is the software that helps run & manage containers in a host operating system. There are a number of Container runtimes in the market from Docker, runC, containerd, etc.

So in order to make an abstraction over all the container runtime supported by the Kubernetes the community has introduced a new concept called CRI(Container Runtime Interface) that talks to the container runtime.

The kubelet talks to the Container Runtime Interface(CRI) using a gRPC framework where kubelet is the client and CRI is the server.

Garbage Collection

It is term that k8s use to clean up the cluster resource.

  • Owner & Dependents In k8s there are some objects that are dependent on others. So k8s clean up the related object before deleting the object.
  • Cascading Deletion k8s deletes an object that no longer has owner references. Like the pods left after deleting the ReplicaSet.
    • Foreground Cascading Deletion –
      • The object we are trying to delete goes in a progressive state.
      • The Kubernetes API server sets the object’s metadata. deletion timestamp field to the time the object was marked for deletion.
      • The Kubernetes API server also sets the metadata. finalizers field to foregroundDeletion.
      • After going into the in-progress state the controller deletes all the dependent and removes the parent object.
    • Background Cascading Deletion –
      • Here the k8s deletes the owner object immediately.
      • Then the controller clean up the dependent objects.

Learning: Kubernetes – Why We Need Pod Abstraction Above Containers

I have discussed pods in the previous blog. Now in the short, a container is a standard unit of software that packages up code and all its dependencies in a virtualized environment that has its own file system.

As nodes are the VM or Physical Machine we could have run the container inside it without having the pod abstraction. But there will be some major problems that will arise in terms of managing the cluster and that is networking.

As we all know the container application runs in a specific port and more than one application can’t occupy a port. So if you need two containers of the same application running in a node then the same application needs to run in a different port and connection between them will be very messy.

And that is why Kubernetes solves the problem with pod abstraction. Each pod has a unique network namespace. This means each pod will have its own virtual ethernet. It’s like the pod is a small VM inside the node. And now each pod will have the application container running with the same port and there will be no conflict because all the containers running in self-contained isolated machines.

Now suppose a pod has more than one container(The main container and a helper container) then the container inside the pod will communicate with each other using the localhost.

Learning: Kubernetes – Controller & Cloud Controller


k8s controller is like a thermostat in a room. It checks the current temperature and maintains the temperature by turning off & on the switch. Same here the k8s controller checks the current state and verify that if the current state is equal to the desired state or not. If the current state is not matched with the desired state it makes the necessary changes and brings to the desired state.

Types of the controller –

  • ReplicaSet – It is responsible for maintaining the set desired number of pods.
  • Deployment – Deployment is the most common way to get your app on Kubernetes. It maintains a ReplicaSet with the desired configuration.
  • StatefulSet – A StatefulSet is used to manage stateful applications with persistent storage.
  • Job – A Job creates one or more short-lived Pods and expects them to successfully terminate.
  • CronJob – A CronJob creates Jobs on a schedule.
  • DaemonSet – A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them.

Cloud Controller

The cloud controller manager lets you link your cluster into your cloud provider’s API, and separates out the components that interact with that cloud platform from components that only interact with your cluster.

  • Different functions
    • Node Controller –
      • It updates node objects when new servers are created in the cloud.
      • Annotating and labeling the node object with the cloud-specific information.
      • Obtain node hostname & network address.
      • Checks the node health. If the node has been deleted from the cloud then it also removes the node from the k8s cluster.
    • Route Controller – It configures the routes properly so the nodes on the k8s cluster can communicate with each other.
    • Service Controller – It interacts with the cloud provider’s API to set up a load balancer and other infrastructure components.

Learning: Kubernetes – ConfigMaps & Secrets

Every application has configuration data like API key, DB URL, DB user, DB password, etc. Yes, you can hardcode these data in your application but after some time it will be unmanageable. You need some kind of dynamic solution where you define all these data once and every component of your cluster can access these data.

Suppose our application DB URL changes for that we need to change every place where the URL is used. So, for this our application needs to be rebuilt again and we have to change the code inside the application. To encounter this situation we use ConfigMaps and Secrets for storing the application’s required data.

ConfigMaps – A ConfigMaps is an API object used to store non-confidential data in key-value pairs. Pods can consume ConfigMaps as environment variables, command-line arguments, or as configuration files in a volume.

Ex – ConfigMaps contains the URL of the database, user, and other non-credential data.

Secrets – A secret is similar to ConfigMaps, except a secret is used for sensitive information such as credentials. One of the main differences is that you have to explicitly tell kubectl to show you the contents of a secret.

  • It is encoded in base64.