Recently, I set up VMware Tanzu Basic on my home lab (4 Intel NUCs). This write up documents my experiences with the setup and testing with the TKG Extensions 1.2.0. Installing the necessary infrastructure and tools is already very well documented. I added some links to the official documentation for installing the TKG Extensions as well and am mentioning only the pitfalls and learnings from my own installation.
If you want to achieve a very clean setup, you should get or create your own certificate authority. I created my own using the KeyStore Explorer and followed the write up by Eric Shanks for the first steps within vCenter. If you plan to use your own docker registry, you will benefit of this invest later on. I will go further into the details on how to use the CA also within the workload clusters in a later blogpost but I leave you with a link into the Tanzu documentation if you want to start right away. The workload clusters are highly customizable but this comes at the price of a higher complexity.
Networking (Home Lab)
I had quite some issues with networking, because the additional NICs on USB were not running reliably. I followed the lead of Samuel Legrand and installed the USB Network Native Driver for ESXi (I took the /etc/rc.local.d/local.sh from here). This worked like a charm and I got additional 1 Gbit links over USB.
To create a second L2 network I installed pfSense on vSphere and used a second switch. This worked well initially, but then I realized that the second network over USB started to loose connectivity to certain IPs. This was a quite weird effect that I could overcome by creating a second VDS without Network I/O control for it.
For my first tests I chose to take the vSphere networking with HAproxy as load balancer. I got this quick start guide and checked also the official documentation for Tanzu on vSphere. For my installation I used a certificate signed from my CA and did a customized setup. I started from my home network on 192.168.0.0/24 where my ESXi hosts and vCenter are directly running on and carved out a VIP range for HAproxy (192.168.0.144/28). The workload networks are on a 10.0.0.0/20 network with pfSense as the gateway routing device. This way I have enough IPs for multiple workload networks.
After creating a namespace and a successful log in on the CLI (see quick start), I deployed my first cluster:
apiVersion: run.tanzu.vmware.com/v1alpha1 kind: TanzuKubernetesCluster metadata: annotations: tkg/plan: dev labels: tkg.tanzu.vmware.com/cluster-name: s01 name: s01 namespace: niceneasy spec: distribution: version: 1.18.5+vmware.1-tkg.1.c40d30d settings: network: cni: name: antrea pods: cidrBlocks: - 100.96.0.0/11 serviceDomain: tph.local services: cidrBlocks: - 100.64.0.0/13 storage: classes: - vsan-default-storage-policy defaultClass: vsan-default-storage-policy topology: controlPlane: class: best-effort-medium count: 1 storageClass: vsan-default-storage-policy workers: class: best-effort-medium count: 3 storageClass: vsan-default-storage-policy
and deployed it by caling
kubectl apply -f cluster.yaml
I didn’t find, yet, where to access the custom resource definition for TanzuKubernetesCluster directly and didn’t get access to the ones deployed in the vSphere management cluster. I found, however, a ytt template in the ~/.tkg folder (.tkg/providers/infrastructure-tkg-service-vsphere/v1.0.0/ytt/base-template.yaml). But you can use the openapi capabilities of the kubernetes API server by calling
curl -k https://<api-server>/openapi/v2 -H "Authorization: Bearer <token>" > swagger.json
Please get URL and token by examining the output of
kubectl config view
This is quite a nice feature from k8s because it enables generating client libraries also for kubernetes extensions. I will explain this in a later blog post.
I changed the service domain to “tph.local” for the cluster, default value is “cluster.local”.
The service domain is used in /etc/resolve.conf of each node and creates the following entry for the namespace “tanzu-system-monitoring”:
search tanzu-system-monitoring.svc.tph.local svc.tph.local tph.local
This setting can have impact on helm charts or manifest templates where the service domain cannot be customized (as in TKG extensions).
After the cluster has been successfully deployed, you have to login again but this time with the cluster name added:
kubectl vsphere login --server=<your endpoint> --vsphere-username email@example.com --insecure-skip-tls-verify --tanzu-kubernetes-cluster-namespace niceneasy --tanzu-kubernetes-cluster-name s01
Now you can switch the kubernetes context to your newly created cluster. Hint: take a look at the k8s utilities kubectx and kubens. bash-completion and k9s are further recommendations from my side…
Admission Controller – Pod Security Policies
Tanzu on vSphere has an admission controller enabled. I have adapted the installation procedure for using the default pod security policies installed with VMware Tanzu. Although the cluster admin should have the right to deploy workloads directly, it is best practice to explicitly define the PSP per application. As I didn’t want to check each pod if privileged access is really needed, I simply allowed root access for all (time saving, but not suited for production).
It is possible to deactivate the admission controller, but I leave for now and just adapt the extensions to the default setting of oob Tanzu Basic on vSphere.
Download the extensions and unzip them to your working directory.
Cert-Manager is creating and updating certificates for kubernetes components.
First I checked the file 03-cert-manager.yaml and found the service accounts “cert-manager”,”cert-manager-cainjector” and “cert-manager-webhook”. A ClusterRoleBinding is created with the following command (the namespace “cert-manager” does not have to exist, yet):
kubectl create clusterrolebinding cert-manager --clusterrole=psp:vmware-system-privileged --serviceaccount=cert-manager:cert-manager --serviceaccount=cert-manager:cert-manager-cainjector --serviceaccount=cert-manager:cert-manager-webhook
Now you can simply cd into the cert-manager directory and deploy everything with
kubectl apply -f .
Contour Ingress Controller
kubectl create clusterrolebinding contour-privileged --clusterrole=psp:vmware-system-privileged --serviceaccount=tanzu-system-ingress:contour --serviceaccount=tanzu-system-ingress:envoy ytt --ignore-unknown-comments -f common/ -f ingress/contour/ -v infrastructure_provider="vsphere" | kubectl apply -f-
Try to test it with the samples, they are well explained in the README.md – but do not forget to add a service account and create a clusterrolebinding. If you should forget this, you will see an error like
Error creating: pods "<pod name>" is forbidden: unable to validate against any pod security policy: 
if you use
kubectl describe replicaset <service name>
So adapt accordingly:
kubectl create sa sample kubectl create clusterrolebinding sample --clusterrole=psp:vmware-system-privileged --serviceaccount=test-ingress:sample cd ingress/examples/common -> add service account to 02-deployments.yaml
If the tests are successful you have an ingress controller enabling you to use one IP on the load balancer and route the traffic according to the hostnames to the target k8s services.
daniele@ubuntu-dt:~/dev/tkg-extensions$ kubectl get svc -n tanzu-system-ingress NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE contour ClusterIP 100.68.64.67 <none> 8001/TCP 3d4h envoy LoadBalancer 100.70.251.33 192.168.0.149 80:30761/TCP,443:30788/TCP 3d4h
With this command I found the external IP of my contour ingress, 192.168.0.149. Because the traffic is routed by hostname, a DNS entry (CNAME) or an entry in /etc/hosts of the client is needed for each fqdn used over this ingress for further testing.