Create a Runtime SDK extension for Cluster API

Like how the admission controller lets you hook into different workload cluster event requests(creation, updation, deletion of objects) and validate or mutate them accordingly, the runtime extension allows you to connect to various cluster events and make necessary changes.

NOTE – Currently the feature is in the experimental stage and to enable the feature you have to load the environment variable EXP_RUNTIME_SDK=true

In general, the extension works as a webhook and can be written in any language of preference but to leverage the advantages of upstream CAPI we are going to use Golang here.

Here we are going to create a Runtime SDK extension that is going to hook into both DoAfterControlPlaneInitialized & DoAfterControlPlaneInitialized and for its operation on ConfigMaps. Let’s create a project name runtimesdk and create a main.go file where we are doing –

Initializing the necessary command line flags.
Creating a Golang profiler server.
Getting the client for interacting with the Kubernetes API server(see line 94).
Get the handler that we are going to implement next.
Initializing webhook server(see line 82).
Registering BeforeClusterDelete, AfterControlPlaneInitialized events in the webhook server(see line 108).
Run the webhook server.

package main

import (
	"flag"
	"net/http"
	"os"

	handler "github.com/aniruddha2000/runtime-sdk/handlers"
	"github.com/spf13/pflag"
	cliflag "k8s.io/component-base/cli/flag"
	"k8s.io/component-base/logs"
	logsv1 "k8s.io/component-base/logs/api/v1"
	"k8s.io/klog/v2"
	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/client"

	runtimecatalog "sigs.k8s.io/cluster-api/exp/runtime/catalog"
	runtimehooksv1 "sigs.k8s.io/cluster-api/exp/runtime/hooks/api/v1alpha1"
	"sigs.k8s.io/cluster-api/exp/runtime/server"
)

var (
	// catalog contains all information about RuntimeHooks.
	catalog = runtimecatalog.New()

	// Flags.
	profilerAddress string
	webhookPort     int
	webhookCertDir  string
	logOptions      = logs.NewOptions()
)

func init() {
	// Adds to the catalog all the RuntimeHooks defined in cluster API.
	_ = runtimehooksv1.AddToCatalog(catalog)
}

// InitFlags initializes the flags.
func InitFlags(fs *pflag.FlagSet) {
	// Initialize logs flags using Kubernetes component-base machinery.
	logsv1.AddFlags(logOptions, fs)

	// Add test-extension specific flags
	fs.StringVar(&profilerAddress, "profiler-address", "",
		"Bind address to expose the pprof profiler (e.g. localhost:6060)")

	fs.IntVar(&webhookPort, "webhook-port", 9443,
		"Webhook Server port")

	fs.StringVar(&webhookCertDir, "webhook-cert-dir", "/tmp/k8s-webhook-server/serving-certs/",
		"Webhook cert dir, only used when webhook-port is specified.")
}

func main() {
	// Creates a logger to be used during the main func.
	setupLog := ctrl.Log.WithName("main")

	// Initialize and parse command line flags.
	InitFlags(pflag.CommandLine)
	pflag.CommandLine.SetNormalizeFunc(cliflag.WordSepNormalizeFunc)
	pflag.CommandLine.AddGoFlagSet(flag.CommandLine)
	pflag.Parse()

	// Validates logs flags using Kubernetes component-base machinery and applies them
	if err := logsv1.ValidateAndApply(logOptions, nil); err != nil {
		setupLog.Error(err, "unable to start extension")
		os.Exit(1)
	}

	// Add the klog logger in the context.
	ctrl.SetLogger(klog.Background())

	// Initialize the golang profiler server, if required.
	if profilerAddress != "" {
		klog.Infof("Profiler listening for requests at %s", profilerAddress)
		go func() {
			klog.Info(http.ListenAndServe(profilerAddress, nil))
		}()
	}

	// Create a http server for serving runtime extensions
	webhookServer, err := server.New(server.Options{
		Catalog: catalog,
		Port:    webhookPort,
		CertDir: webhookCertDir,
	})
	if err != nil {
		setupLog.Error(err, "error creating webhook server")
		os.Exit(1)
	}

	// Lifecycle Hooks
	restConfig, err := ctrl.GetConfig()
	if err != nil {
		setupLog.Error(err, "error getting config for the cluster")
		os.Exit(1)
	}

	client, err := client.New(restConfig, client.Options{})
	if err != nil {
		setupLog.Error(err, "error creating client to the cluster")
		os.Exit(1)
	}

	lifecycleExtensionHandlers := handler.NewExtensionHandlers(client)

	// Register extension handlers.
	if err := webhookServer.AddExtensionHandler(server.ExtensionHandler{
		Hook:        runtimehooksv1.BeforeClusterDelete,
		Name:        "before-cluster-delete",
		HandlerFunc: lifecycleExtensionHandlers.DoBeforeClusterDelete,
	}); err != nil {
		setupLog.Error(err, "error adding handler")
		os.Exit(1)
	}

	if err := webhookServer.AddExtensionHandler(server.ExtensionHandler{
		Hook:        runtimehooksv1.AfterControlPlaneInitialized,
		Name:        "before-cluster-create",
		HandlerFunc: lifecycleExtensionHandlers.DoAfterControlPlaneInitialized,
	}); err != nil {
		setupLog.Error(err, "error adding handler")
		os.Exit(1)
	}

	// Setup a context listening for SIGINT.
	ctx := ctrl.SetupSignalHandler()

	// Start the https server.
	setupLog.Info("Starting Runtime Extension server")
	if err := webhookServer.Start(ctx); err != nil {
		setupLog.Error(err, "error running webhook server")
		os.Exit(1)
	}
}

Now, it’s time to create the handlers for each event, let’s create a file handlers/hooks.go , here we are doing this –

DoAfterControlPlaneInitialized –
- Check whether a ConfigMap is present or not for the particular name & namespace.
- If not it’s going to create one, otherwise it won’t complain about anything and the request will pass.
DoBeforeClusterDelete –
- Check whether the ConfigMap is present or not for the particular name & namespace.
- If yes it’s going to delete it before the workload cluster gets deleted, otherwise the request will pass.

package handler

import (
	"context"
	"fmt"

	"github.com/pkg/errors"
	corev1 "k8s.io/api/core/v1"
	apierrors "k8s.io/apimachinery/pkg/api/errors"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
	"k8s.io/klog/v2"
	clusterv1 "sigs.k8s.io/cluster-api/api/v1beta1"
	runtimehooksv1 "sigs.k8s.io/cluster-api/exp/runtime/hooks/api/v1alpha1"
	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/client"
)

type ExtensionHandler struct {
	client client.Client
}

func NewExtensionHandlers(client client.Client) *ExtensionHandler {
	return &ExtensionHandler{
		client: client,
	}
}

func (e *ExtensionHandler) DoBeforeClusterDelete(ctx context.Context, request *runtimehooksv1.BeforeClusterDeleteRequest, response *runtimehooksv1.BeforeClusterDeleteResponse) {
	log := ctrl.LoggerFrom(ctx)
	log.Info("DoBeforeClusterDelete is called")
	log.Info("Namespace:", request.Cluster.GetNamespace(), "ClusterName: ", request.Cluster.GetName())

	// Your implementation
	configMapName := fmt.Sprintf("%s-test-extension-hookresponse", request.Cluster.GetName())
	ok, err := e.checkConfigMap(ctx, &request.Cluster, configMapName)
	if err != nil {
		response.Status = runtimehooksv1.ResponseStatusFailure
		response.Message = err.Error()
		return
	}
	if ok {
		if err := e.deleteConfigMap(ctx, &request.Cluster, configMapName); err != nil {
			response.Status = runtimehooksv1.ResponseStatusFailure
			response.Message = err.Error()
			return
		}
	}
}

func (e *ExtensionHandler) DoAfterControlPlaneInitialized(ctx context.Context, request *runtimehooksv1.AfterControlPlaneInitializedRequest, response *runtimehooksv1.AfterControlPlaneInitializedResponse) {
	log := ctrl.LoggerFrom(ctx)
	log.Info("DoAfterControlPlaneInitialized is called")
	log.Info("Namespace:", request.Cluster.GetNamespace(), "ClusterName: ", request.Cluster.GetName())

	// Your implementation
	configMapName := fmt.Sprintf("%s-test-extension-hookresponse", request.Cluster.GetName())
	ok, err := e.checkConfigMap(ctx, &request.Cluster, configMapName)
	if err != nil {
		response.Status = runtimehooksv1.ResponseStatusFailure
		response.Message = err.Error()
		return
	}
	if !ok {
		if err := e.createConfigMap(ctx, &request.Cluster, configMapName); err != nil {
			response.Status = runtimehooksv1.ResponseStatusFailure
			response.Message = err.Error()
			return
		}
	}
}

func (e *ExtensionHandler) checkConfigMap(ctx context.Context, cluster *clusterv1.Cluster, configMapName string) (bool, error) {
	log := ctrl.LoggerFrom(ctx)
	log.Info("Checking for ConfigMap", configMapName)

	configMap := &corev1.ConfigMap{}
	nsName := client.ObjectKey{Namespace: cluster.GetNamespace(), Name: configMapName}
	if err := e.client.Get(ctx, nsName, configMap); err != nil {
		if apierrors.IsNotFound(err) {
			log.Info("ConfigMap not found")
			return false, nil
		}
		log.Error(err, "ConfigMap not found with an error")
		return false, errors.Wrapf(err, "failed to read the ConfigMap %s", klog.KRef(cluster.Namespace, configMapName))
	}
	log.Info("ConfigMap found")
	return true, nil
}

func (e *ExtensionHandler) createConfigMap(ctx context.Context, cluster *clusterv1.Cluster, configMapName string) error {
	log := ctrl.LoggerFrom(ctx)
	log.Info("Creating ConfigMap")

	configMap := e.getConfigMap(cluster, configMapName)
	if err := e.client.Create(ctx, configMap); err != nil {
		log.Error(err, "failed to create ConfigMap")
		return errors.Wrapf(err, "failed to create the ConfigMap %s", klog.KRef(cluster.Namespace, configMapName))
	}
	log.Info("configmap created successfully")
	return nil
}

func (e *ExtensionHandler) deleteConfigMap(ctx context.Context, cluster *clusterv1.Cluster, configMapName string) error {
	log := ctrl.LoggerFrom(ctx)
	log.Info("Deleting ConfigMap")

	if err := e.client.Delete(ctx, &corev1.ConfigMap{
		ObjectMeta: metav1.ObjectMeta{
			Name:      configMapName,
			Namespace: cluster.GetNamespace(),
		},
	}); err != nil {
		log.Error(err, "failed to delete ConfigMap")
		return err
	}
	return nil
}

func (e *ExtensionHandler) getConfigMap(cluster *clusterv1.Cluster, configMapName string) *corev1.ConfigMap {
	return &corev1.ConfigMap{
		ObjectMeta: metav1.ObjectMeta{
			Name:      configMapName,
			Namespace: cluster.GetNamespace(),
		},
		Data: map[string]string{
			"AfterControlPlaneInitialized-preloadedResponse": `{"Status": "Success"}`,
		},
	}
}

Implement the Kubernetes manifest

This is the most interesting part and some bits and pieces need to be taken care of, such as –

Kubernetes ecosystem by default only supports SSL secure webhooks. For that, we are going to use cert-manager to automate the self-signed certificate automation.
The extension config must be registered through the ExtensionConfig CRD.
Don’t forget about the RBAC, if you are doing some operation over some resources, make sure you define permissions for those.

NOTE – For this example, we are doing everything in runtimesdk namespace.

Let’s start with certificate.yaml –

Creating a self-signed certificate using the Issuer
Defining the DNS Service name for the certificate
- <service_name>.<namespace>.svc

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: runtime-sdk-selfsigned-issuer
  namespace: runtimesdk
spec:
  selfSigned: {}

---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: serving-cert
  namespace: runtimesdk
spec:
  dnsNames:
    - test-runtime-sdk-svc.runtimesdk.svc
    - test-runtime-sdk-svc.runtimesdk.svc.cluster.local
    - localhost
  issuerRef:
    kind: Issuer
    name: runtime-sdk-selfsigned-issuer
  secretName: test-runtime-sdk-svc-cert

service.yaml –

Defining the ClusterIP service, and the target deployment. Running the webhook in port 443 which is typically used for https URLs.

apiVersion: v1
kind: Service
metadata:
  name: test-runtime-sdk-svc
  namespace: runtimesdk
spec:
  type: ClusterIP
  selector:
    app: test-runtime-sdk
  ports:
    - port: 443
      targetPort: 9443

deployment.yaml –

Build your docker image and push it to the repository.
Get the certificates and mount them in a volume, and use it in the argument while running the container.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-runtime-sdk
  namespace: runtimesdk
spec:
  selector:
    matchLabels:
      app: test-runtime-sdk
  template:
    metadata:
      labels:
        app: test-runtime-sdk
    spec:
      serviceAccountName: test-runtime-sdk-sa
      containers:
        - name: test-runtime-sdk
          image: <image_name>:<image_tag>
          imagePullPolicy: Always
          args:
            - --webhook-cert-dir=/var/run/webhook/serving-cert/
          resources:
            limits:
              memory: "128Mi"
              cpu: "500m"
          ports:
            - containerPort: 9443
          volumeMounts:
            - mountPath: /var/run/webhook/serving-cert
              name: serving-cert
      volumes:
        - name: serving-cert
          secret:
            secretName: test-runtime-sdk-svc-cert

Service Account, Cluster Role, Cluster Rolebindings –

Create your own service account.
Add get, list, create, and delete permissions.
Bind the role with the service account using role bindings.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: test-runtime-sdk-sa
  namespace: runtimesdk

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: test-runtime-sdk-role
rules:
  - apiGroups:
      - ""
    resources:
      - configmaps
    verbs:
      - get
      - list
      - create
      - delete

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: test-runtime-sdk-role-rolebinding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: test-runtime-sdk-role-role
subjects:
  - kind: ServiceAccount
    name: test-runtime-sdk-sa
    namespace: runtimesdk

Lastly, the most important piece, the ExtensionConfig CRD –

Get the certificates through annotations.
Specify where the Runtime Extension is deployed.
Specify Runtime Extension is used by Cluster in which namespace.

apiVersion: runtime.cluster.x-k8s.io/v1alpha1
kind: ExtensionConfig
metadata:
  annotations:
    runtime.cluster.x-k8s.io/inject-ca-from-secret: runtimesdk/test-runtime-sdk-svc-cert
  name: test-runtime-sdk-extensionconfig
spec:
  clientConfig:
    service:
      name: test-runtime-sdk-svc
      namespace: runtimesdk # Note: this assumes the test extension get deployed in the runtimesdk namespace
      port: 443
  namespaceSelector:
    matchExpressions:
      - key: kubernetes.io/metadata.name
        operator: In
        values:
          - default # Note: this assumes the test extension is used by Cluster in the default namespace only

You can define the Dockerfile like this –

FROM golang:alpine3.17 as builder
WORKDIR /src
COPY . .
RUN --mount=type=cache,target=/root/.cache/go-build \
    --mount=type=cache,target=/go/pkg/mod \
    go build -o runtime-sdk

FROM alpine
WORKDIR /app
COPY --from=builder /src/runtime-sdk /app/runtime-sdk
ENTRYPOINT ["/app/runtime-sdk"]

Let’s run the App in a Kind CAPD Cluster

Export necessary ENV variables, Create a kind cluster –

$ cat > cluster.env << EOF
export CLUSTER_TOPOLOGY=true
export EXP_RUNTIME_SDK=true
export SERVICE_CIDR=["10.96.0.0/12"]
export POD_CIDR=["192.168.0.0/16"]
export SERVICE_DOMAIN="k8s.test"
EOF

$ source cluster.env

$ cat > kind-cluster-with-extramounts.yaml <<EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
  ipFamily: dual
name: extension-config-test
nodes:
- role: control-plane
  extraMounts:
    - hostPath: /var/run/docker.sock
      containerPath: /var/run/docker.sock
EOF

$ kind create cluster --config kind-cluster-with-extramounts.yaml
Creating cluster "extension-config-test" ...
 ✓ Ensuring node image (kindest/node:v1.27.1) 🖼
 ✓ Preparing nodes 📦  
 ✓ Writing configuration 📜 
 ✓ Starting control-plane 🕹️ 
 ✓ Installing CNI 🔌 
 ✓ Installing StorageClass 💾 
Set kubectl context to "kind-extension-config-test"
You can now use your cluster with:

kubectl cluster-info --context kind-extension-config-test

Thanks for using kind! 😊

Create the runtimesdk namespace & initialize the management cluster –

$ kubectl create ns runtimesdk

$ clusterctl init --infrastructure docker
Fetching providers
Installing cert-manager Version="v1.11.1"
Waiting for cert-manager to be available...
Installing Provider="cluster-api" Version="v1.4.2" TargetNamespace="capi-system"
Installing Provider="bootstrap-kubeadm" Version="v1.4.2" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="control-plane-kubeadm" Version="v1.4.2" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="infrastructure-docker" Version="v1.4.2" TargetNamespace="capd-system"

Your management cluster has been initialized successfully!

You can now create your first workload cluster by running the following:

  clusterctl generate cluster [name] --kubernetes-version [version] | kubectl apply -f -

Now apply all of the created manifest, and see there are two thing that you must see is –
- ExtensionConfig Deployment logs.
- Status of the ExtensionConfig CRD.

$ k apply -f runtime-sdk/manifests/config/
extensionconfig.runtime.cluster.x-k8s.io/test-runtime-sdk-extensionconfig created
issuer.cert-manager.io/runtime-sdk-selfsigned-issuer created
certificate.cert-manager.io/serving-cert created
deployment.apps/test-runtime-sdk created
serviceaccount/test-runtime-sdk-sa created
clusterrole.rbac.authorization.k8s.io/test-runtime-sdk-role created
clusterrolebinding.rbac.authorization.k8s.io/test-runtime-sdk-role-rolebinding created
service/test-runtime-sdk-svc created

$ k get pods -n runtimesdk
NAME                                READY   STATUS    RESTARTS   AGE
test-runtime-sdk-5bc665d7b9-725hl   1/1     Running   0          12m

$ k logs -n runtimesdk test-runtime-sdk-5bc665d7b9-725hl --follow
I0524 07:30:59.714901       1 main.go:130] "main: Starting Runtime Extension server"
I0524 07:30:59.715180       1 server.go:149] "controller-runtime/webhook: Registering webhook" path="/hooks.runtime.cluster.x-k8s.io/v1alpha1/beforeclusterdelete/before-cluster-delete"
I0524 07:30:59.715261       1 server.go:149] "controller-runtime/webhook: Registering webhook" path="/hooks.runtime.cluster.x-k8s.io/v1alpha1/aftercontrolplaneinitialized/before-cluster-create"
I0524 07:30:59.715314       1 server.go:149] "controller-runtime/webhook: Registering webhook" path="/hooks.runtime.cluster.x-k8s.io/v1alpha1/discovery"
I0524 07:30:59.715340       1 server.go:217] "controller-runtime/webhook/webhooks: Starting webhook server"
I0524 07:30:59.716380       1 certwatcher.go:131] "controller-runtime/certwatcher: Updated current TLS certificate"
I0524 07:30:59.716757       1 certwatcher.go:85] "controller-runtime/certwatcher: Starting certificate watcher"
I0524 07:30:59.716918       1 server.go:271] "controller-runtime/webhook: Serving webhook server" host="" port=9443

Now the log showing that our app is running perfectly, let’s see the status now,

$ k describe extensionconfig test-runtime-sdk-extensionconfig -n runtimesdk
Name:         test-runtime-sdk-extensionconfig
Namespace:    
Labels:       <none>
Annotations:  runtime.cluster.x-k8s.io/inject-ca-from-secret: runtimesdk/test-runtime-sdk-svc-cert
API Version:  runtime.cluster.x-k8s.io/v1alpha1
Kind:         ExtensionConfig
Metadata:
  Creation Timestamp:  2023-05-24T07:21:49Z
  Generation:          2
  Resource Version:    3939
  UID:                 62af95a7-d924-46f6-9c5a-4ba3f4407749
Spec:
  Client Config:
    Ca Bundle:  LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURIakNDQWdhZ0F3SUJBZ0lSQUx0b1VxQzlEdHBIVTl2TkJrU0xmV0l3RFFZSktvWklodmNOQVFFTEJRQXcKQURBZUZ3MHlNekExTWpRd056SXhORGxhRncweU16QTRNakl3TnpJeE5EbGFNQUF3Z2dFaU1BMEdDU3FHU0liMwpEUUVCQVFVQUE0SUJEd0F3Z2dFS0FvSUJBUURBMUl0Mm1OdVdJMmpRUlY1cHRWTDZ3cGFHdWhObG9GWHV2b1poCkwzWHJRcktiWmRaRnJUbGlZSTI4TXlxVmhSNGh2U2MzVXp5TS8rUjdYVURCT01BNkFZeEtacXg0a3VPRk1ITXkKcUhDTTNuZTZUUCsxUS9CQkRWelMvdk9tRzdnNlF1V3VyMmFtbW4zeTI4dUpWZ0hVaUZQaHZLVHE4U0J4LzY0NQo3bEluQWVpSWVrc3JqTHFJRlFka3NnSlAvbUxSTjI4RTNPL0tVTEp5RWxsakxIelZZcmVXck5rUEh6OGVmZmFECmtmSnMxTTN0NFh3c1Jyd09QQXliUmtGcTNJbENpNEoyL3EyZHZTRlRXdy9EelRuSkE1OEt6N003MlN6aXlJRnkKM1U3ajRISkVqbG9paGU2dlJtUUxEZm5wV0xEdXhvbVJpdURMWU14dHU5VkxweEdIQWdNQkFBR2pnWkl3Z1k4dwpEZ1lEVlIwUEFRSC9CQVFEQWdXZ01Bd0dBMVVkRXdFQi93UUNNQUF3YndZRFZSMFJBUUgvQkdVd1k0SWpkR1Z6CmRDMXlkVzUwYVcxbExYTmtheTF6ZG1NdWNuVnVkR2x0WlhOa2F5NXpkbU9DTVhSbGMzUXRjblZ1ZEdsdFpTMXoKWkdzdGMzWmpMbkoxYm5ScGJXVnpaR3N1YzNaakxtTnNkWE4wWlhJdWJHOWpZV3lDQ1d4dlkyRnNhRzl6ZERBTgpCZ2txaGtpRzl3MEJBUXNGQUFPQ0FRRUFFSUsvOFJqeFBiYy80T2I4MWY4Z2h2dVN3Z0Y0V0dkK3dONVZpSndICngzVm5GWGJ6d1YvMHZreEJ5SDhFR2xLcnRjcTNVMDFvZ0taQVRadW9DYWxLVjZvUHYvNklNbXR4WHMzMk5EeWoKamwvU3FHOXJlMFhRMXBYa2xIVHpIMk9ha0ozWjZ1TUMxSzgrWS9YRUJMYzZibjhYSXpad3N5VDJkZ0RJeTkrNQpkMjZqek9EejZ4Y2h2TzBSNm1ZK2psazJpMzdwSHRiZWxrOExFeE9ObmFNWlZvWWIrYmtRWXZ5MEZQdEhsZ0NnClQycVBWQ3FISmV2cWxIakk3UFQ4YmVlNFVKcHc1Rld4L0FjbU9qd3BjTkZWbkMwaFFtZmNTazNvb2Z4bTViem0KUTd1d1ZaSzBmWDFaVjJvWGNrZEtPMUluNnZpVkpWSzRESzV3MXh3MnBMWHhGUT09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
    Service:
      Name:       test-runtime-sdk-svc
      Namespace:  runtimesdk
      Port:       443
  Namespace Selector:
    Match Expressions:
      Key:       kubernetes.io/metadata.name
      Operator:  In
      Values:
        default
Status:
  Conditions:
    Last Transition Time:  2023-05-24T07:32:44Z
    Status:                True
    Type:                  Discovered
  Handlers:
    Failure Policy:  Fail
    Name:            before-cluster-delete.test-runtime-sdk-extensionconfig
    Request Hook:
      API Version:    hooks.runtime.cluster.x-k8s.io/v1alpha1
      Hook:           BeforeClusterDelete
    Timeout Seconds:  10
    Failure Policy:   Fail
    Name:             before-cluster-create.test-runtime-sdk-extensionconfig
    Request Hook:
      API Version:    hooks.runtime.cluster.x-k8s.io/v1alpha1
      Hook:           AfterControlPlaneInitialized
    Timeout Seconds:  10
Events:               <none>

If you look closely, it has fetched the CA bundle correctly from the annotations and both of the Hook showing in the status.

Create a Workload Cluster Now –

$ clusterctl generate cluster extension-config-test --flavor development \
--kubernetes-version v1.27.1 \
--control-plane-machine-count=1 \
--worker-machine-count=1 \
> manifests/capi/capi-quickstart.yaml

$ k apply -f manifests/capi/capi-quickstart.yaml
clusterclass.cluster.x-k8s.io/quick-start created
dockerclustertemplate.infrastructure.cluster.x-k8s.io/quick-start-cluster created
kubeadmcontrolplanetemplate.controlplane.cluster.x-k8s.io/quick-start-control-plane created
dockermachinetemplate.infrastructure.cluster.x-k8s.io/quick-start-control-plane created
dockermachinetemplate.infrastructure.cluster.x-k8s.io/quick-start-default-worker-machinetemplate created
kubeadmconfigtemplate.bootstrap.cluster.x-k8s.io/quick-start-default-worker-bootstraptemplate created
cluster.cluster.x-k8s.io/extension-config-test created

Let’s see the logs and ConfigMap if it has created something or not,

I0524 07:40:49.405854       1 hooks.go:52] "DoAfterControlPlaneInitialized is called"
I0524 07:40:49.406022       1 hooks.go:53] "Namespace:" default="ClusterName: " extension-config-test="(MISSING)"
I0524 07:40:49.406093       1 hooks.go:74] "Checking for ConfigMap" extension-config-test-test-extension-hookresponse="(MISSING)"
I0524 07:40:49.421562       1 hooks.go:80] "ConfigMap not found"
I0524 07:40:49.421596       1 hooks.go:92] "Creating ConfigMap"
I0524 07:40:49.437841       1 hooks.go:99] "configmap created successfully"

$ k get configmaps
NAME                                                DATA   AGE
extension-config-test-test-extension-hookresponse   1      76s
kube-root-ca.crt                                    1      26m

Yep, now our config map is up. Let’s test the delete,

$ delete -f manifests/capi/capi-quickstart.yaml

$ k logs -n runtimesdk test-runtime-sdk-5bc665d7b9-725hl --follow
I0524 07:44:08.266319       1 hooks.go:30] "DoBeforeClusterDelete is called"
I0524 07:44:08.266347       1 hooks.go:31] "Namespace:" default="ClusterName: " extension-config-test="(MISSING)"
I0524 07:44:08.268351       1 hooks.go:74] "Checking for ConfigMap" extension-config-test-test-extension-hookresponse="(MISSING)"
I0524 07:44:08.288940       1 hooks.go:86] "ConfigMap found"
I0524 07:44:08.289163       1 hooks.go:105] "Deleting ConfigMap"

$ k get configmaps
NAME               DATA   AGE
kube-root-ca.crt   1      29m

So, now there is now ConfigMap as well, Everything is working fine then 😉

Thanks for reading 🙂

Feedbacks are welcome!

Being a part of the first ever Release Team of Cluster API as CI Signal/Bug Triage/Automation Shadow

Cluster API 1.4 just got released and I consider myself very lucky to be able to help with the release as the CI Signal Shadow and get to work with the fantastic folks in the community! If this sounds interesting to you, but you’ve no clue how to get started, then keep reading, and I’ll try to answer most questions people have when getting started with the release team.

What Is the Cluster API Release Team?

For each new release of Cluster API, there is a team of community members who are responsible for day-to-day release work, for example, Watching CI in the test-grid, Writing, and publishing release notes, highlighting release blocking issues to the maintainer, etc.

The release team is divided into a few subgroups with a Release Lead and a few shadows and also a lead and shadows in each subgroup as well. The subgroups are –

Communication Team.
CI Signal/Bug Triage/Automation Team.

My Personal Experience With the Release Team

I started with Cluster API 1.4 Release Team as a CI Signal Shadow. The CI Signal Team is responsible for monitoring the CI throughout the release and ensuring any test failures report weekly to the team and get fixed before the release. The team was very helpful in onboarding me as a complete beginner in the release team and clarifying every doubt I had regarding the release. I am still new in terms of fixing failures and that’s my improvement goal in the future.

Now I am continuing to work in the 1.5 release team in CI Signal.

Thanks to the entire release team for having me and helping me whenever I had any doubts.

Why Should you Consider Applying?

Joining as a shadow allows you to directly get mentored by the role lead and the broader release team. Not only do you end up learning about the release process, but you also get to learn a lot about how the open-source K8s project is structured.

How do I apply?

The Cluster API release team doesn’t have any official shadow selection process like upstream Kubernetes has. Anybody who is an active member of the community and has the enthusiasm to learn new things can come and help the team. One potential idea is to contribute to Cluster API and join release meetings every week and get involved in the next release cycle.

How I got selected for the LFX Mentorship Program

LFX Mentorship (previously known as Community Bridge) is a platform developed by the Linux Foundation, which promotes and accelerates the adoption, innovation, and sustainability of open-source software.

LFX Mentorship is actively used by the Cloud Native Computing Foundation(CNCF) as a mentorship platform across the CNCF projects

Program Schedule

2022 — Fall Term — September 1st – Nov 30th

2022 — Summer Term — June 1st – August 31st (My Term)

2022 — Spring Term — March 1st – May 31st

How to Apply

You have to write the Cover Letter and mention all the points about why you are interested in the projects and any previous work you have done or not and what you expect from the project etc.

Tip: Start contributing early and talk to the maintainers about your interests in the program and start to discuss the issue/feature you are going to work on.

My Project

My Project is Cluster API Provider for GCP(CAPG). It is a CNCF Project that helps manage the Kubernetes cluster in the Google Cloud Platform. Currently, another provider Cluster API Provider for AWS(CAPA), Cluster API Provider for Azure(CAPZ) has the support for taking advantage of GPU in their cluster but CAPG doesn’t have so Me and Subhasmita my co-mentee will work on the project to add support for GPU in the CAPG.

My Mentors

Carlos Panato (@cpanato)
Davanum Srinivas (@dims)
Richard Case (@richardcase)

My Co-Mentee

Subhasmita Swain

Well, my journey would be a little monotonous if I didn’t have a co-mentee. It makes my work a little interesting because when we are both stuck on anything we hope on a call and discuss things. Also the weekly work we divide each other and teach each other what we have learned.

How It All Started

I didn’t have any plan to do LFX from the beginning. I started my journey with CAPG for GSoC”22. I applied for the same project and the same feature in the GSoC but that didn’t happen because the project didn’t get selected in the GSoC eventually all the applications to the project got rejected as well. So I talked to the maintainer Richard and told them that can I work in the GPU work as I was very interested in it. He told me that there is still hope in the LFX Mentorship and he opened an application there and I applied there. And then I got selected for the LFX Mentorship 🎉

How It Is Going

I was a little bit worried about how I will work on a big project like this where there are thousands of lines of code and me just a written a project with a max of 500 lines. But I am amazed how the maintainers made my journey very easy and got me onboarded with the introduction to the project for a couple of weeks and gave me small tasks of trying things out and asking a question if I am stuck at any point.

Next Steps:

I will start the GPU work the next week with Subhasmita and keep contributing to the project in the future.

Create a managed cluster using Cluster API Provider for Google Cloud Platform (CAPG)

In the previous blog, I explained how to create and manage Kubernetes with cluster API locally with the help of docker infrastructure.

In this blog, I will explain how to create and manage the k8s with Cluster API in the google cloud.

Note – Throughout the blog, I will use Kubernetes version 1.22.9 and it is recommended to use the version of our OS image created by the image builder. You can check from kubernetes.json and use that.

Step 1 –

Create the kind cluster –

kind create cluster --image kindest/node:v1.22.9 --wait 5m

Step 2 –

Follow image builder for GCP steps and build an image.

Step 3 –

Export the following env variables – (reference)

export GCP_PROJECT_ID=<YOUR PROJECT ID>
export GOOGLE_APPLICATION_CREDENTIALS=<PATH TO GCP CREDENTIALS>
export GCP_B64ENCODED_CREDENTIALS=$( cat /path/to/gcp-credentials.json | base64 | tr -d '\n' )

export CLUSTER_TOPOLOGY=true
export GCP_REGION="us-east4"
export GCP_PROJECT="<YOU GCP PROJECT NAME>"
export KUBERNETES_VERSION=1.22.9
export IMAGE_ID=projects/$GCP_PROJECT/global/images/<IMAGE ID>
export GCP_CONTROL_PLANE_MACHINE_TYPE=n1-standard-2
export GCP_NODE_MACHINE_TYPE=n1-standard-2
export GCP_NETWORK_NAME=default
export CLUSTER_NAME=test

Step 4 –

setup the network in this example we are using the default network so we will create some router/nats for our workload cluster to have internet access.

gcloud compute routers create "${CLUSTER_NAME}-myrouter" --project="${GCP_PROJECT}" --region="${GCP_REGION}" --network="default"

gcloud compute routers nats create "${CLUSTER_NAME}-mynat" --project="${GCP_PROJECT}" --router-region="${GCP_REGION}" --router="${CLUSTER_NAME}-myrouter" --nat-all-subnet-ip-ranges --auto-allocate-nat-external-ips

Step 5 –

Initialize the infrastructure

clusterctl init --infrastructure gcp

Generate the workload cluster config and apply it

clusterctl generate cluster $CLUSTER_NAME --kubernetes-version v1.22.9 > workload-test.yaml

kubectl apply -f workload-test.yaml

View the cluster and its resources

$ clusterctl describe cluster $CLUSTER_NAME
NAME                                                               READY  SEVERITY  REASON                 SINCE  MESSAGE
/test                                                              False  Info      WaitingForKubeadmInit  5s
├─ClusterInfrastructure - GCPCluster/test
└─ControlPlane - KubeadmControlPlane/test-control-plane            False  Info      WaitingForKubeadmInit  5s
  └─Machine/test-control-plane-x57zs                               True                                    31s
    └─MachineInfrastructure - GCPMachine/test-control-plane-7xzw2

Check the status of the control plane

$ kubectl get kubeadmcontrolplane
NAME                 CLUSTER   INITIALIZED   API SERVER AVAILABLE   REPLICAS   READY   UPDATED   UNAVAILABLE   AGE    VERSION
test-control-plane   test                                           1                  1         1             2m9s   v1.22.9

Note – The controller plane won’t be ready until the next step when I install the CNI (Container Network Interface).

Step 6 –

Get the kubeconfig for the workload cluster

$ clusterctl get kubeconfig $CLUSTER_NAME > workload-test.kubeconfig

Apply the cni

kubectl --kubeconfig=./workload-test.kubeconfig \
  apply -f https://docs.projectcalico.org/v3.20/manifests/calico.yaml

Wait a bit and you should see this when getting the kubeadmcontrolplane

$ kubectl get kubeadmcontrolplane
NAME                 CLUSTER   INITIALIZED   API SERVER AVAILABLE   REPLICAS   READY   UPDATED   UNAVAILABLE   AGE     VERSION
test-control-plane   test      true          true                   1          1       1         0             6m33s   v1.22.9


$ kubectl get nodes --kubeconfig=./workload-test.kubeconfig
NAME                       STATUS   ROLES                  AGE   VERSION
test-control-plane-7xzw2   Ready    control-plane,master   62s   v1.22.9

Step 7 –

Edit the MachineDeployment in the workload-test.yaml it has 0 replicas add the replicas you want to have your nodes, in this case, we used 2. Apply the workload-test.yaml

$ kubectl apply -f workload-test.yaml

After a few minutes, you should see something like this –

$ clusterctl describe cluster $CLUSTER_NAME
NAME                                                               READY  SEVERITY  REASON  SINCE  MESSAGE
/test                                                              True                     15m
├─ClusterInfrastructure - GCPCluster/test
├─ControlPlane - KubeadmControlPlane/test-control-plane            True                     15m
│ └─Machine/test-control-plane-x57zs                               True                     19m
│   └─MachineInfrastructure - GCPMachine/test-control-plane-7xzw2
└─Workers
  └─MachineDeployment/test-md-0                                    True                     10m
    └─2 Machines...                                                True                     13m    See test-md-0-68bd55744b-qpk67, test-md-0-68bd55744b-tsgf6

$ kubectl get nodes --kubeconfig=./workload-test.kubeconfig
NAME                       STATUS   ROLES                  AGE   VERSION
test-control-plane-7xzw2   Ready    control-plane,master   21m   v1.22.9
test-md-0-b7766            Ready    <none>                 17m   v1.22.9
test-md-0-wsgpj            Ready    <none>                 17m   v1.22.9

Yaaa! Now we have a Kubernetes cluster in the GCP with 1 control pannel with 2 worker nodes.

Step 8 –

Delete what you have created –

$ kubectl delete cluster $CLUSTER_NAME

$ gcloud compute routers nats delete "${CLUSTER_NAME}-mynat" --project="${GCP_PROJECT}" \
    --router-region="${GCP_REGION}" --router="${CLUSTER_NAME}-myrouter"

$ gcloud compute routers delete "${CLUSTER_NAME}-myrouter" --project="${GCP_PROJECT}" \
    --region="${GCP_REGION}"

$ kind delete cluster

What is Kubernetes Cluster API and Setup a Local Cluster API using Docker

I have came across the term cluster API while I was contributing to Flatcar Linux. But I didn’t knew much about it then. In recent days I have been tinkering around the Kubernetes and started learning what cluster API is and what it does. So Cluster API or CAPI is a tool from the Kubernetes Special Interest Group(SIG) that uses Kubernetes-style APIs and patterns to automate cluster lifecycle management for platform operators.
In general term it is the project that helps manage your k8s cluster no matter where they are including various cloud providers. Because a k8s cluster include a lot of component from hardware, software, services, networking, storage and so on and so forth.

Motivation

I wrote this blog in the motivation of setting it up locally and contribute in this project. In recent days I have came across a lot of Computer Science core subjects like Computer Networking, Database Management System and really amazed to see the interconnection with the distributed systems.
I am still very new in the operation of various cloud provider but in the near future I am willing to learn those thing and apply Kubernetes over there.
I also want to participate in the GSoC and work in this particular project and Improve CAPG by adding more features and support GKE.

Setting up CAPI locally with Docker

Requirements : You need to have the following packages installed in your system before starting it –

Step 1 –

Infrastructure Provider – It is like a provider which is providing compute & resources in order to spin a cluster. We are going to use docker as our infrastructure here.

Create a kind config file for allowing the Docker provider to access Docker on the host:

cat > kind-cluster-with-extramounts.yaml <<EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  extraMounts:
    - hostPath: /var/run/docker.sock
      containerPath: /var/run/docker.sock
EOF

Then I create a kind cluster using the following config file –

kind create cluster --config kind-cluster-with-extramounts.yaml

Step 2 –

Now installing the clusterctl tool to manage the lifecycle of a CAPI management cluster –

Installation in linux OS – (For other OS – ref)

$ curl -L https://github.com/kubernetes-sigs/cluster-api/releases/download/v0.4.0/clusterctl-linux-amd64 -o clusterctl
$ chmod +x ./clusterctl
$ sudo mv ./clusterctl /usr/local/bin/clusterctl
$ clusterctl version

Step 3 –

Now it’s time for use the clusterctl to transform the kind cluster to a management cluster by clusterctl init command. The command accepts a list of provider.

Management Cluster – A Management cluster is a Kubernetes cluster that manages the lifecycle of Workload Clusters. A Management Cluster is also where one or more Infrastructure Providers run, and where resources such as Machines are stored.

I am using docker as my infrastructure so I will use the command below –

clusterctl init --infrastructure docker

Step 4 –

Now it’s time for creating a workload cluster.

Workload Cluster – A workload cluster is a cluster created by a ClusterAPI controller, which is not a bootstrap cluster, and is meant to be used by end-users.

Now we use clusterctl generate cluster to generate a YAML file to create a workload cluster.

clusterctl generate cluster test-workload-cluster --flavor development \
--kubernetes-version v1.21.2 \
--control-plane-machine-count=3 \
--worker-machine-count=3 \
> test-workload-cluster.yaml

Now apply the file to create the workload cluster –

kubectl apply -f test-workload-cluster.yaml

Step 5 –

Now we verify our workload cluster and access it.

Get the status of the cluster

kubectl get cluster

View the cluster and it’s resources

clusterctl describe cluster test-workload-cluster

Check the status of the control plane

kubectl get kubeadmcontrolplane

Note – The controller plane won’t be ready untill the next step when I install the CNI (Container Network Interface).

Step 6 –

Now it’s the time to setup the CNI solution

First get the workload cluster kubeconfig

clusterctl get kubeconfig test-workload-cluster > test-workload-cluster.kubeconfig

It will use calico for an example.

kubectl --kubeconfig=./test-workload-cluster.kubeconfig apply -f https://docs.projectcalico.org/v3.18/manifests/calico.yaml

After some time the node should be up and running.

kubectl --kubeconfig=./test-workload-cluster.kubeconfig get nodes

Step 7 –

Now it’s the last phase to delete the resources –

Delete the workload cluster

kubectl delete cluster test-workload-cluster

Delete the management cluster

kind delete cluster