Release Notes

Page last updated:

This topic contains release notes for Tanzu Kubernetes Grid Integrated Edition (TKGI) v1.10.

Warning: Before installing or upgrading to Tanzu Kubernetes Grid Integrated Edition v1.10, review the Breaking Changes below.

TKGI v1.10.5

Release Date: August 17, 2021

Product Snapshot

Release Details
Version v1.10.5
Release date August 17, 2021
Component Version
Antrea v0.11.1
cAdvisor v0.39.1
CNS for vSphere v2.0.0
v1.0.2
CoreDNS v1.7.0_vmware.13
Docker Linux: v20.10.7
Windows: v19.03.17
etcd v3.4.13
Harbor v2.2.3
Kubernetes v1.19.13
Metrics Server v0.3.6
NCP v3.1.0.2
Percona XtraDB Cluster (PXC) v0.37.0
UAA v74.5.24
Velero v1.4.2
VMware Cloud Foundation (VCF) v4.2.0
Wavefront Wavefront Collector: v1.2.6
Wavefront Proxy: v9.2
Compatibilities Versions
Ops Manager See VMware Tanzu Network*.
NSX-T See VMware Product Interoperability Matrices.
vSphere
Windows stemcells v2019.37 or later
Xenial stemcells See VMware Tanzu Network.

* Do not use TKGI with Ops Manager v2.10.15 or with Ops Manager v2.10.17 and vSphere CPIv2. For more information, see v2.10.15 and v2.10.17 in Ops Manager v2.10 Release Notes.

Upgrade Path

The supported upgrade paths to TKGI v1.10.5 are from from Tanzu Kubernetes Grid Integrated Edition v1.10.0 and later or from Tanzu Kubernetes Grid Integrated Edition v1.9.0 and later.

Features

This section describes new features and changes in Tanzu Kubernetes Grid Integrated Edition v1.10.5.

Component Updates

The following components have been updated:

  • Bumps CoreDNS to v1.7.0_vmware.13.
  • Bumps Docker Windows to v19.03.17.
  • Bumps Harbor to v2.2.3.
  • Bumps Kubernetes to v1.19.13.
  • Bumps Percona XtraDB Cluster (PXC) to v0.37.0.
  • Bumps UAA to v74.5.24.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition v1.10.4 are also in Tanzu Kubernetes Grid Integrated Edition v1.10.5. See the TKGI v1.10.4 Known Issues below.

Warning: Do not use TKGI with Ops Manager v2.10.15 or with Ops Manager v2.10.17 and vSphere CPIv2. For more information, see v2.10.15 and v2.10.17 in Ops Manager v2.10 Release Notes.


TKGI Management Console v1.10.5

Release Date: August 17, 2021

Product Snapshot

Note: Tanzu Kubernetes Grid Integrated Edition Management Console provides an opinionated installation of TKGI. The supported versions may differ from or be more limited than what is generally supported by TKGI.

Element Details
Version v1.10.5
Release date August 17, 2021
Installed Tanzu Kubernetes Grid Integrated Edition version v1.10.5
Installed Ops Manager version v2.10.16
Installed Kubernetes version v1.19.13
Installed Harbor Registry version v2.2.3
Linux stemcell v621.136
Windows stemcells v2019.37 or later


Upgrade Path

The supported upgrade path to Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.5 is from Tanzu Kubernetes Grid Integrated Edition v1.10.0 and later and Tanzu Kubernetes Grid Integrated Edition v1.9.0 and later.

Features

Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.5 updates include:

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.4 are also in Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.5. See the TKGI Management Console v1.10.4 Known Issues below.


TKGI v1.10.4

Release Date: July 1, 2021

Product Snapshot

Release Details
Version v1.10.4
Release date July 1, 2021
Component Version
Antrea v0.11.1
cAdvisor v0.39.1
CNS for vSphere v2.0.0
v1.0.2
CoreDNS v1.7.0_vmware.9
Docker Linux: v20.10.7
Windows: v19.03.14
etcd v3.4.13
Harbor v2.2.2
Kubernetes v1.19.10
Metrics Server v0.3.6
NCP v3.1.0.2
Percona XtraDB Cluster (PXC) v0.35.0
UAA v74.5.23
Velero v1.4.2
VMware Cloud Foundation (VCF) v4.2.0
Wavefront Wavefront Collector: v1.2.6
Wavefront Proxy: v9.2
Compatibilities Versions
Ops Manager See VMware Tanzu Network*.
NSX-T See VMware Product Interoperability Matrices.
vSphere
Windows stemcells v2019.34 or later
Xenial stemcells See VMware Tanzu Network.

* Do not use TKGI with Ops Manager v2.10.15 or with Ops Manager v2.10.17 and vSphere CPIv2. For more information, see v2.10.15 and v2.10.17 in Ops Manager v2.10 Release Notes.

Upgrade Path

The supported upgrade paths to TKGI v1.10.4 are from from Tanzu Kubernetes Grid Integrated Edition v1.10.0 and later or from Tanzu Kubernetes Grid Integrated Edition v1.9.0 and later.

Features

This section describes new features and changes in Tanzu Kubernetes Grid Integrated Edition v1.10.4.

Component Updates

The following components have been updated:

  • Bumps cAdvisor to v0.39.1.
  • Bumps Docker Linux to v20.10.7.
  • Bumps Harbor to v2.2.2.
  • Bumps NCP to v3.1.0.2.
  • Bumps PXC to v0.35.0.
  • Bumps UAA to v74.5.23.
  • Bumps Xenial stemcell to v621.130.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition v1.10.3 are also in Tanzu Kubernetes Grid Integrated Edition v1.10.4. See the TKGI v1.10.3 Known Issues below.

Warning: Do not use TKGI with Ops Manager v2.10.15 or with Ops Manager v2.10.17 and vSphere CPIv2. For more information, see v2.10.15 and v2.10.17 in Ops Manager v2.10 Release Notes.


“drain-cluster-windows” Errand Does Not Exist

This issue is fixed in TKGI v1.10.5.

Symptom

When you delete a Linux cluster, the drain-cluster-windows errand reports the following error:

Errand 'drain-cluster-windows' doesn't exist

Explanation

When you delete a cluster, the following errands run:

  • drain-cluster
  • drain-cluster-windows

Both errands run regardless of whether the cluster is a Linux or Windows cluster.

On a Linux cluster, the Windows cluster and the drain-cluster-windows errand do not exist. The log shows the following:

Task 572
Task 572 | 11:39:54 | Preparing deployment: Preparing deployment
Task 572 | 11:39:54 | Deprecation: Global 'properties' are deprecated. Please define 'properties' at the job level.
Task 572 | 11:39:56 | Preparing deployment: Preparing deployment (00:00:02)
 L Error: Errand 'drain-cluster-windows' doesn't exist
Task 572 | 11:39:56 | Error: Errand 'drain-cluster-windows' doesn't exist
Task 572 Started Tue Jun 29 11:39:54 UTC 2021
Task 572 Finished Tue Jun 29 11:39:56 UTC 2021
Task 572 Duration 00:00:02
Task 572 error

On a Windows cluster, both the drain-cluster and drain-cluster-windows errands exist, and no error messages are generated.

Workaround

No workaround is necessary. The error message is informational, accurate, and does not negatively impact deleting the cluster.


TKGI Management Console v1.10.4

Release Date: July 1, 2021

Product Snapshot

Note: Tanzu Kubernetes Grid Integrated Edition Management Console provides an opinionated installation of TKGI. The supported versions may differ from or be more limited than what is generally supported by TKGI.

Element Details
Version v1.10.4
Release date July 1, 2021
Installed Tanzu Kubernetes Grid Integrated Edition version v1.10.4
Installed Ops Manager version v2.10.14
Installed Kubernetes version v1.19.10
Installed Harbor Registry version v2.2.2
Linux stemcell v621.131
Windows stemcells v2019.34 or later


Upgrade Path

The supported upgrade path to Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.4 is from Tanzu Kubernetes Grid Integrated Edition v1.10.0 and later and Tanzu Kubernetes Grid Integrated Edition v1.9.0 and later.

Features

Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.4 updates include:

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.3 are also in Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.4. See the TKGI Management Console v1.10.3 Known Issues below.


TKGI v1.10.3

Release Date: June 17, 2021

Product Snapshot

Release Details
Version v1.10.3
Release date June 17, 2021
Component Version
Antrea v0.11.1
cAdvisor v0.35.0
CSI Driver for vSphere v2.0.0
v1.0.2
CoreDNS v1.7.0_vmware.8
Docker Linux: v19.03.14
Windows: v19.03.14
etcd v3.4.13
Harbor v2.2.1
Kubernetes v1.19.10
Metrics Server v0.3.6
NCP v3.1.0.1
Percona XtraDB Cluster (PXC) v0.33.0
UAA v74.5.22
Velero v1.4.2
VMware Cloud Foundation (VCF) v4.2.0
Wavefront Wavefront Collector: v1.2.6
Wavefront Proxy: v9.2
Compatibilities Versions
Ops Manager See VMware Tanzu Network*.
NSX-T See VMware Product Interoperability Matrices.
vSphere
Windows stemcells v2019.34 or later
Xenial stemcells See VMware Tanzu Network.

* Do not use TKGI with Ops Manager v2.10.15 or with Ops Manager v2.10.17 and vSphere CPIv2. For more information, see v2.10.15 and v2.10.17 in Ops Manager v2.10 Release Notes.

Upgrade Path

The supported upgrade paths to TKGI v1.10.3 are from from Tanzu Kubernetes Grid Integrated Edition v1.10.0 and later or from Tanzu Kubernetes Grid Integrated Edition v1.9.0 and later.

The supported upgrade paths to TKGI v1.10.3 are from Tanzu Kubernetes Grid Integrated Edition v1.9.0 and later.

Features

This section describes new features and changes in Tanzu Kubernetes Grid Integrated Edition v1.10.3.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition v1.10.2 are also in Tanzu Kubernetes Grid Integrated Edition v1.10.3. See the TKGI v1.10.2 Known Issues below.

Warning: Do not use TKGI with Ops Manager v2.10.15 or with Ops Manager v2.10.17 and vSphere CPIv2. For more information, see v2.10.15 and v2.10.17 in Ops Manager v2.10 Release Notes.


TKGI Management Console v1.10.3

Release Date: June 17, 2021

Product Snapshot

Note: Tanzu Kubernetes Grid Integrated Edition Management Console provides an opinionated installation of TKGI. The supported versions may differ from or be more limited than what is generally supported by TKGI.

Element Details
Version v1.10.3
Release date June 17, 2021
Installed Tanzu Kubernetes Grid Integrated Edition version v1.10.3
Installed Ops Manager version v2.10.11
Installed Kubernetes version v1.19.10
Installed Harbor Registry version v2.2.1
Linux stemcell v621.125
Windows stemcells v2019.34 or later


Upgrade Path

The supported upgrade path to Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.3 is from Tanzu Kubernetes Grid Integrated Edition v1.10.0 and later and Tanzu Kubernetes Grid Integrated Edition v1.9.0 and later.

Features

Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.3 updates include:

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.2 are also in Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.3. See the TKGI Management Console v1.10.2 Known Issues below.


TKGI v1.10.2 - Withdrawn

Warning: This release has been removed from VMware Tanzu Network because of the issue: Nodes with More Than 30 Stateful Pods Are in a NotReady State.

Release Date: May 21, 2021

Product Snapshot

Release Details
Version v1.10.2
Release date May 21, 2021
Component Version
Antrea v0.11.1
cAdvisor v0.35.0
CSI Driver for vSphere v2.0.0
v1.0.2
CoreDNS v1.7.0+vmware.8
Docker Linux: v19.03.14
Windows: v19.03.14
etcd v3.4.13
Harbor v2.2.1
Kubernetes v1.19.10
Metrics Server v0.3.6
NCP v3.1.0.1
Percona XtraDB Cluster (PXC) v0.33.0
UAA v74.5.22
Velero v1.4.2
VMware Cloud Foundation (VCF) v4.2.0
Wavefront Wavefront Collector: v1.2.6
Wavefront Proxy: v9.2
Compatibilities Versions
Ops Manager See VMware Tanzu Network*.
NSX-T See VMware Product Interoperability Matrices.
vSphere
Windows stemcells v2019.34 or later
Xenial stemcells See VMware Tanzu Network.

* Do not use TKGI with Ops Manager v2.10.15 or with Ops Manager v2.10.17 and vSphere CPIv2. For more information, see v2.10.15 and v2.10.17 in Ops Manager v2.10 Release Notes.

Upgrade Path

The supported upgrade paths to TKGI v1.10.2 are from Tanzu Kubernetes Grid Integrated Edition v1.9.0 and later.

Warning: If you have nodes with more than 30 stateful Pods or nodes with a total of more than 50 Pods, do not upgrade to TKGI v1.10.2. For more information, see Nodes with More Than 30 Stateful Pods Are in a NotReady State below.

Features

This section describes new features and changes in Tanzu Kubernetes Grid Integrated Edition v1.10.2.

Component Updates

The following components have been updated:

  • Bumps Kubernetes to v1.19.10.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition v1.10.1 are also in Tanzu Kubernetes Grid Integrated Edition v1.10.2. See the TKGI v1.10.1 Known Issues below.

Warning: Do not use TKGI with Ops Manager v2.10.15 or with Ops Manager v2.10.17 and vSphere CPIv2. For more information, see v2.10.15 and v2.10.17 in Ops Manager v2.10 Release Notes.


Nodes with More Than 30 Stateful Pods Are in a NotReady State

This issue is fixed in TKGI v1.10.3.

Warning: If you have nodes with more than 30 stateful Pods or nodes with a total of more than 50 Pods, do not upgrade to TKGI v1.10.2.

Symptom

Nodes that have more than 50 Pods or more than 30 stateful Pods are in a NotReady state, and you see the following errors in kubelet logs:

Error syncing pod ... skipping: rpc error: code = DeadlineExceeded desc = context deadline exceeded
skipping pod synchronization - PLEG is not healthy: pleg was last seen active...
skipping pod synchronization - PLEG is not healthy: pleg was last seen active...
failed to collect filesystem stats - rootDiskErr: could not stat .. to get inode usage: stat ...: no such file or directory, extraDiskErr: could not stat

Explanation

The runc component has introduced instability into nodes consisting of 50 or more Pods or 30 or more statefulset Pods. For more information, see loading seccomp filter: invalid argument in the runc GitHub repository.


TKGI Management Console v1.10.2 - Withdrawn

Warning: This release has been removed from VMware Tanzu Network because of the issue: Nodes with More Than 30 Stateful Pods Are in a NotReady State.

Release Date: May 21, 2021

Product Snapshot

Note: Tanzu Kubernetes Grid Integrated Edition Management Console provides an opinionated installation of TKGI. The supported versions may differ from or be more limited than what is generally supported by TKGI.

Element Details
Version v1.10.2
Release date May 21, 2021
Installed Tanzu Kubernetes Grid Integrated Edition version v1.10.2
Installed Ops Manager version v2.10.11
Installed Kubernetes version v1.19.10
Installed Harbor Registry version v2.2.1
Linux stemcell v621.125
Windows stemcells v2019.34 or later


Upgrade Path

The supported upgrade path to Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.2 is from Tanzu Kubernetes Grid Integrated Edition v1.10.0 and later and Tanzu Kubernetes Grid Integrated Edition v1.9.0 and later.

Warning: If you have nodes with more than 30 stateful Pods or nodes with a total of more than 50 Pods, do not upgrade to TKGI v1.10.2. For more information, see Nodes with More Than 30 Stateful Pods Are in a NotReady State above.

Features

Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.2 updates include:

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.1 are also in Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.2. See the TKGI Management Console v1.10.1 Known Issues below.

Warning: If you have nodes with more than 30 stateful Pods or nodes with a total of more than 50 Pods, do not upgrade to TKGI v1.10.2. For more information, see Nodes with More Than 30 Stateful Pods Are in a NotReady State above.


TKGI v1.10.1

Release Date: April 12, 2021

Product Snapshot

Release Details
Version v1.10.1
Release date April 12, 2021
Component Version
Kubernetes v1.19.9
CoreDNS v1.7.0+vmware.8
Docker Linux: v19.03.14
Windows: v19.03.14
etcd v3.4.13
Metrics Server v0.3.6
NCP v3.1.0.1
Percona XtraDB Cluster (PXC) v0.33.0
UAA v74.5.22
Compatibilities Versions
VMware Cloud Foundation (VCF) v4.2.0
Ops Manager See VMware Tanzu Network*.
Xenial stemcells See VMware Tanzu Network.
Windows stemcells v2019.34 or later
NSX-T See VMware Product Interoperability Matrices.
vSphere
CSI Driver for vSphere v2.0.0
v1.0.2
Harbor v2.1.3
Velero v1.4.2

* Do not use TKGI with Ops Manager v2.10.15 or with Ops Manager v2.10.17 and vSphere CPIv2. For more information, see v2.10.15 and v2.10.17 in Ops Manager v2.10 Release Notes.

Upgrade Path

The supported upgrade paths to TKGI v1.10.1 are from Tanzu Kubernetes Grid Integrated Edition v1.9.0 and later.

Features

This section describes new features and changes in Tanzu Kubernetes Grid Integrated Edition v1.10.1.

Component Updates

The following components have been updated:

  • Bumps Kubernetes to v1.19.4.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition v1.10.0 are also in Tanzu Kubernetes Grid Integrated Edition v1.10.1. See the TKGI v1.10.0 Known Issues below.

Warning: Do not use TKGI with Ops Manager v2.10.15 or with Ops Manager v2.10.17 and vSphere CPIv2. For more information, see v2.10.15 and v2.10.17 in Ops Manager v2.10 Release Notes.


Workloads Using Dynamic PVs Must be Removed Before Deleting a Cluster

This issue is fixed in TKGI v1.10.2.

Symptom

Your tkgi delete-cluster operation hangs while draining a worker VM containing a Pod bound to one or more dynamic persistent volumes (PVs).

Workaround

Before deleting the cluster, remove all workloads.

If you have already attempted to delete a cluster and your workloads with dynamic PVs have stopped, remove the dynamic PVs attached to the worker VMs in your cluster. For information about removing dynamic PVs from a worker VM, see Workloads using dynamic PersistentVolumes (PVs) must be removed before deleting a cluster in the Knowledge Base.


TKGI Management Console v1.10.1

Release Date: April 12, 2021

Product Snapshot

Note: Tanzu Kubernetes Grid Integrated Edition Management Console provides an opinionated installation of TKGI. The supported versions may differ from or be more limited than what is generally supported by TKGI.

Element Details
Version v1.10.1
Release date April 12, 2021
Installed Tanzu Kubernetes Grid Integrated Edition version v1.10.1
Installed Ops Manager version v2.10.8
Installed Kubernetes version v1.19.9
Installed Harbor Registry version v2.1.3
Linux stemcell v621.117
Windows stemcells v2019.31 or later


Upgrade Path

The supported upgrade path to Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.1 is from Tanzu Kubernetes Grid Integrated Edition v1.10.0 and later and Tanzu Kubernetes Grid Integrated Edition v1.9.0 and later.

Features

Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.1 updates include:

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.0 are also in Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.1. See the TKGI Management Console v1.10.0 Known Issues below.


TKGI v1.10.0

Release Date: January 28, 2021

Product Snapshot

Release Details
Version v1.10.0
Release date January 28, 2021
Component Version
Kubernetes v1.19.6
CoreDNS v1.7.0+vmware.5
Docker Linux: v19.03.14
Windows: v19.03.14
etcd v3.4.13
Metrics Server v0.3.6
NCP v3.1.0.1
Percona XtraDB Cluster (PXC) v0.31.0
UAA v74.5.21
Compatibilities Versions
VMware Cloud Foundation (VCF) v4.2.0
Ops Manager See VMware Tanzu Network*.
Xenial stemcells See VMware Tanzu Network.
Windows stemcells v2019.29 or later
NSX-T See VMware Product Interoperability Matrices.
vSphere
CSI Driver for vSphere v2.0.0
v1.0.2
Harbor v2.1.1
Velero v1.4.2

* Do not use TKGI with Ops Manager v2.10.15 or with Ops Manager v2.10.17 and vSphere CPIv2. For more information, see v2.10.15 and v2.10.17 in Ops Manager v2.10 Release Notes.

Upgrade Path

The supported upgrade paths to TKGI v1.10.0 are from Tanzu Kubernetes Grid Integrated Edition v1.9.0 and later.

Features

This section describes new features and changes in VMware Tanzu Kubernetes Grid Integrated Edition v1.10.0.

Cluster-Specific Proxy Settings (NSX-T and AWS)

You can configure proxy settings specific to individual TKGI clusters, overriding the global settings in the TKGI tile > Networking pane. For more information, see Configure Cluster Proxies.

Supports the Antrea CNI

Tanzu Kubernetes Grid Integrated Edition now provides the option to use the Antrea Container Network Interface (CNI) as the CNI for new TKGI-provisioned clusters. For more information about using Antrea as your CNI, see About Upgrading from the Flannel CNI to the Antrea CNI in About Tanzu Kubernetes Grid Integrated Edition Upgrades.

NSX-T Certificate Rotation

You can now rotate TKGI-provisioned Kubernetes cluster NSX-T TLS certificates using a TKGI CLI command. For more information, see Certificate Rotation.

Apply Persistent Node Labels and Node Taints Using Compute Profiles

On vSphere and vSphere with NSX-T, Tanzu Kubernetes Grid Integrated Edition supports applying persistent labels and taints to a Kubernetes node using Compute Profiles. For more information see node_pools Block in Creating and Managing Compute Profiles with the CLI (vSphere).

Windows Worker Kubernetes Clusters Support Active Directory

Windows Server with Active Directory can now control access to TKGI Windows worker-based Kubernetes clusters through integration with group Managed Service Account (gMSA). For more information, see Authenticate Windows Clusters with Active Directory.

TKGI-Defined Wavefront Alerts Removed from the Tile

TKGI v1.10 removes the following configuration options from the Wavefront integration in the tile:

  • Create pre-defined Wavefront alerts errand
  • Delete pre-defined Wavefront alerts errand
  • Wavefront Alert Recipient

If you want to enable pre-defined Wavefront alerts for TKGI v1.10, configure your alert targets in Wavefront. For a list of available alerts, see Predefined Alerts for the Integration.

If you enabled the Create pre-defined Wavefront alerts errand and Wavefront Alert Recipient in an earlier version of TKGI and you upgrade your environment to v1.10, you will continue to receive the TKGI-defined alerts. Uninstall these alerts in the Wavefront UI if you plan to configure alert targets for TKGI in Wavefront, by following the instructions below:

  1. In Wavefront, navigate to Integrations –> VMware Tanzu™ Kubernetes Grid™ Integrated Edition.
  2. Click the Alerts tab and then click Uninstall All.

Component Updates

The following components have been updated:

  • Bumps Kubernetes to v1.19.4.
  • Bumps Xenial stemcell to v621.94.
  • Bumps NCP to v3.1.0.17170700.
  • Bumps PXC to v0.31.0.
  • Bumps UAA to v74.5.21.

Breaking Changes

TKGI v1.10.0 has the following breaking changes.

TKGI v1.10 Is Not Compatible with NSX-T v2.5.2 or Earlier

TKGI v1.10 is not compatible with NSX-T v2.5.2 or earlier. If you are deploying TKGI v1.10 to NSX-T, your NSX-T version must be NSX-T v3.0.1 or later. For more information about upgrading NSX-T and TKGI, see Upgrade Order for Tanzu Kubernetes Grid Integrated Edition Environments on vSphere and Upgrading Tanzu Kubernetes Grid Integrated Edition (NSX-T Networking).

Swap Is Disabled by Default

Swap is now disabled on all worker nodes. In previous versions of Tanzu Kubernetes Grid Integrated Edition, Swap was enabled, but upstream Kubernetes does not support this setting. You cannot enable swap through the TKGI CLI, and manually configuring swap is not permitted.

Cluster Creation, Update and Upgrade Failure Error Messages No Longer Truncated

Tanzu Kubernetes Grid Integrated Edition v1.10 includes improved error messages for cluster creation, update and upgrade failures. Previously, error messages greater than 128 bytes were truncated. In TKGI v1.10 logged cluster creation and upgrade failure error messages are no longer truncated.

On NSX-T v3.1 the Transport Zone and Edge Transport Node Switch Names Must be Identical

This issue is fixed in TKGI v1.10.2 and TKGI MC v1.10.2.

The NSX-T Transport Zone and Edge Transport Node switch names must be identical if TKGI v1.10 is installed on NSX-T v3.1. For more information, see Configuring NSX-T Data Center v3.1 Transport Zones and Edge Node Switches for Tanzu Kubernetes Grid Integrated Edition.

Kubectl Authentication Requires a Certificate Generated With a SAN

Kubernetes now requires the certificate used for Kubernetes authentication to be generated with a Subject Alternate Name (SAN). If you are using OIDC for Kubernetes authentication, the certificate serving the OIDC provider must include a SAN.

If you provide Kubernetes with a certificate generated without a SAN, kubectl commands will fail with an error: “You must be logged in to the server (Unauthorized)”.

Known Issues

TKGI v1.10.0 has the following known issues:

Warning: Do not use TKGI with Ops Manager v2.10.15 or with Ops Manager v2.10.17 and vSphere CPIv2. For more information, see v2.10.15 and v2.10.17 in Ops Manager v2.10 Release Notes.


Pods Stop After Upgrading From NSX-T v3.0.2 to v3.1.0 on vSphere 7.0 and 7.0.1

Symptom

Your TKGI-provisioned Pods stop after upgrading from NSX-T v3.0.2 to NSX-T v3.1.0 on vSphere 7.0 and 7.0.1.

Explanation

For information, see Issue 2603550: Some VMs are vMotioned and lose network connectivity during UA nodes upgrade in the VMware NSX-T Data Center 3.1.1 Release Notes.

Workaround

To avoid the loss of network connectivity during UA node upgrade, ensure DRS is set to manual mode during your upgrade from NSX-T v3.0.2 to v3.1.0.

If you upgraded to NSX-T v3.1.0 with DRS in automation mode, run the following on the affected Pods’ master VMs to restore Pod connectivity:

monit restart ncp

For more information on upgrading NSX-T v3.0.2 to NSX-T v3.1.0, see Upgrade NSX-T Data Center to v3.0 or v3.1.


Error: Could Not Execute “Apply-Changes” in Azure Environment

Symptom

After clicking Apply Changes on the TKGI tile in an Azure environment, you experience an error ’…could not execute “apply-changes”…’ with either of the following descriptions:

  • {“errors”:{“base”:[“undefined method 'location’ for nil:NilClass”]}}
  • FailedError.new(“Resource Groups in region ’#{location}’ do not support Availability Zones”))

For example:

INFO | 2020-09-21 03:46:49 +0000 | Vessel::Workflows::Installer#run | Install product (apply changes)
2020/09/21 03:47:02 could not execute "apply-changes": installation failed to trigger: request failed: unexpected response from /api/v0/installations:
HTTP/1.1 500 Internal Server Error
Transfer-Encoding: chunked
Cache-Control: no-cache, no-store
Connection: keep-alive
Content-Type: application/json; charset=utf-8
Date: Mon, 21 Sep 2020 17:51:50 GMT
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Pragma: no-cache
Referrer-Policy: strict-origin-when-cross-origin
Server: Ops Manager
Strict-Transport-Security: max-age=31536000; includeSubDomains
X-Content-Type-Options: nosniff
X-Download-Options: noopen
X-Frame-Options: SAMEORIGIN
X-Permitted-Cross-Domain-Policies: none
X-Request-Id: f5fc99c1-21a7-45c3-7f39
X-Runtime: 9.905591
X-Xss-Protection: 1; mode=block

44
{"errors":{"base":["undefined method `location' for nil:NilClass"]}}
0

Explanation

The Azure CPI endpoint used by Ops Manager has been changed and your installed version of Ops Manager is not compatible with the new endpoint.

Workaround

Run the following Ops Manager CLI command:

om --skip-ssl-validation --username USERNAME --password PASSWORD --target https://OPSMAN-API curl --silent --path /api/v0/staged/director/verifiers/install_time/IaasConfigurationVerifier -x PUT -d '{ "enabled": false }'

Where:

  • USERNAME is the account to use to run Ops Manager API commands.
  • PASSWORD is the password for the account.
  • OPSMAN-API is the IP address for the Ops Manager API

For more information, see Error 'undefined method location’ is received when running Apply Change on Azure in the VMware Tanzu Knowledge Base.


VMware vRealize Operations Does Not Support Windows Worker-Based Kubernetes Clusters

VMware vRealize Operations (vROPs) does not support Windows worker-based Kubernetes clusters and cannot be used to manage TKGI-provisioned Windows workers.


TKGI Wavefront Requires Manual Installation for Windows Workers

To monitor Windows-based worker node clusters with a Wavefront collector and proxy, you must first install Wavefront on the clusters manually, using Helm. For instructions, see the Wavefront section of the Monitoring Windows Worker Clusters and Nodes topic.


Pinging Windows Worker Kubernetes Clusters Does Not Work

TKGI-provisioned Windows worker-based Kubernetes clusters inherit a Kubernetes limitation that prevents outbound ICMP communication from workers. As a result, pinging Windows workers does not work.

For information about this limitation, see Limitations > Networking in the Windows in Kubernetes documentation.


Velero Does Not Support Backing Up Stateful Windows Workloads

You can use Velero to backup stateless TKGI-provisioned Windows workers. Velero can back up stateless Windows workloads only, and cannot be used to backup stateful Windows applications. For more information, see Velero on Windows in Basic Install in the Velero documentation.


Tanzu Mission Control Integration Not Supported on GCP

TKGI on Google Cloud Platform (GCP) does not support Tanzu Mission Control (TMC) integration, which is configured in the Tanzu Kubernetes Grid Integrated Edition tile > the Tanzu Mission Control (Experimental) pane.

If you intend to run TKGI v1.9 on GCP, skip this pane when configuring the Tanzu Kubernetes Grid Integrated Edition tile.


TMC Data Protection Feature Requires Privileged TKGI Containers

TMC Data Protection feature supports privileged TKGI containers only. For more information, see Plans in the Installing TKGI topic for your IaaS.


Windows Worker Kubernetes Clusters with Group Managed Service Account Do Not Support Compute Profiles

Windows worker-based Kubernetes clusters integrated with group Managed Service Account (gMSA) cannot be managed using compute profiles.


Windows Worker Kubernetes Clusters on Flannel Do Not Support Compute Profiles

On vSphere with NSX-T networking you can use compute profiles with both Linux and Windows worker‑based Kubernetes clusters. On vSphere with Flannel networking, you can apply compute profiles only to Linux clusters.


TKGI Does Not Support Managing Pre-TKGI v1.9 Compute Profiles

Compute profiles created in TKGI v1.8 and earlier have a different format from current compute profiles.

TKGI v1.10 does not support resizing, updating, upgrading, or managing compute profiles using tkgi CLI compute profile commands, on a cluster that has a compute profile created in TKGI v1.8 and earlier.


TKGI CLI Does Not Prevent Reducing the Control Plane Node Count

TKGI CLI does not prevent accidentally reducing a cluster’s control plane node count using a compute profile.

Warning: Reducing a cluster’s control plane node count can destroy the cluster. Do not scale out or scale in existing master nodes by reconfiguring the TKGI tile or by using a compute profile. Reducing a cluster’s number of control plane nodes may remove a master node and cause the cluster to become inactive.


Windows Cluster Nodes Not Deleted After VM Deleted

Symptom

After you delete a VM using the management console of your infrastructure provider, you notice a Windows worker node that had been on that VM is now in a notReady state.

Solution

  1. To identify the leftover node:

    kubectl get no -o wide
    
  2. Locate nodes on the returned list that are in a notReady state and have the same IP address as another node in the list.

  3. To manually delete a notReady node:

    kubectl delete node NODE-NAME
    

    Where NODE-NAME is the name of the node in the notReady state.


502 Bad Gateway After OIDC Login

Symptom

You experience a “502 Bad Gateway” error from the NSX load balancer after you log in to OIDC.

Explanation

A large response header has exceeded your NSX-T load balancer maximum response header size. The default maximum response header size is 10,240 characters and should be resized to 50,000.

Workaround

If you experience this issue, manually reconfigure your NSX-T request_header_size and response_header_size to 50,000 characters. For information about configuring NSX-T default header sizes, see OIDC Response Header Overflow in the Knowledge Base.


Difficulty Changing Proxy for Windows Workers

You must configure a global proxy in the Tanzu Kubernetes Grid Integrated Edition tile > Networking pane before you create any Windows workers that use the proxy.

You cannot change the proxy configuration for Windows workers in an existing cluster.


Character Limitations in HTTP Proxy Password

For vSphere with NSX-T, the HTTP Proxy password field does not support the following special characters: & or ;.


Error After Modifying Your Harbor Storage Configuration

Symptom

You receive the following error after modifying your existing Harbor installation’s storage configuration:

Error response from daemon: manifest for ... not found: manifest unknown: manifest unknown

Explanation

Harbor does not support modifying an existing Harbor installation’s storage configuration.

Workaround

To modify your Harbor storage configuration, re-install Harbor. Before starting Harbor, configure the new Harbor installation with the desired configuration.


Unexplained Errors After Interrupting a Log Stream When Using Antrea Networking

Symptom

While using Antrea networking, you observe unexplainable errors after you interrupt a log stream started using kubectl logs -f POD-NAME. The errors could include any of the following:

  • kubectl returns the error: “Error from server (TooManyRequests): the server has received too many”.
  • kube-apiserver returns an http code 429.

Explanation

When using Antrea networking there is a chance that konnectivity-agent will become unstable after interrupting your kubectl log steam.

Workaround

To resolve the issue:

  1. Log in to the master VM:

    bosh -d DEPLOYMENT-NAME ssh master/0
    
  2. Change to root:

    sudo -i
    
  3. Restart proxy-server:

    monit restart proxy-server
    
  4. Wait for proxy-server restart:

    monit summary
    


Ingress Controller Statefulset Fails to Start After Resizing Worker Nodes

Symptom

Permissions are removed from your cluster’s files and processes after resizing the persistent disk during a cluster upgrade. The ingress controller statefulset fails to start.

Explanation

When resizing a persistent disk, Bosh migrates the data from the old disk to the new disk but does not copy the files’ extended attributes.

Workaround

To resolve the problem, complete the steps in Ingress controller statefulset fails to start after resize of worker nodes with permission denied in the VMware Tanzu Knowledge Base.


Azure Default Security Group Is Not Automatically Assigned to Cluster VMs

Symptom

You experience issues when configuring a load balancer for a multi-master Kubernetes cluster or creating a service of type LoadBalancer. Additionally, in the Azure portal, the VM > Networking page does not display any inbound and outbound traffic rules for your cluster VMs.

Explanation

As part of configuring the Tanzu Kubernetes Grid Integrated Edition tile for Azure, you enter Default Security Group in the Kubernetes Cloud Provider pane. When you create a Kubernetes cluster, Tanzu Kubernetes Grid Integrated Edition automatically assigns this security group to each VM in the cluster. However, on Azure the automatic assignment may not occur.

As a result, your inbound and outbound traffic rules defined in the security group are not applied to the cluster VMs.

Workaround

If you experience this issue, manually assign the default security group to each VM NIC in your cluster.


One Plan ID Longer than Other Plan IDs

Symptom

One of your plan IDs is one character longer than your other plan IDs.

Explanation

In TKGI, each plan has a unique plan ID. A plan ID is normally a UUID consisting of 32 alphanumeric characters and 4 hyphens. However, the Plan 4 ID consists of 33 alphanumeric characters and 4 hyphens.

Solution

You can safely configure and use Plan 4. The length of the Plan 4 ID does not affect the functionality of Plan 4 clusters.

If you require all plan IDs to have identical length, do not activate or use Plan 4.


FQDNs in TKGI API Commands Cannot Contain Uppercase Letters

This issue is fixed in TKGI v1.10.1.

Symptom

TKGI CLI commands fail with error:

Error: An error occurred in the PKS API when processing

Explanation

BOSH DNS, which TKGI uses for its internal DNS, does not support uppercase letters in FQDNs.

Workaround

Use only lowercase letters in FQDNs that you assign to your TKGI API VM in TKGI.


The TKGI API FQDN Must Not Include Trailing Whitespace

Symptom

Your TKGI logs include the following error:

'uaa'. Errors are:- Error filling in template 'uaa.yml.erb' (line 59: Client redirect-uri is invalid: uaa.clients.pks_cli.redirect-uri Client redirect-uri is invalid: uaa.clients.pks_cluster_client.redirect-uri)

Explanation

The TKGI API fully-qualified domain name (FQDN) for your cluster contains leading or trailing whitespace.

Workaround

Do not include whitespace in the TKGI tile API Hostname (FQDN) field.


TKGI CLI get-credentials Returns an “od-broker" Error

This issue is fixed in TKGI v1.10.1.
The probability of encountering this issue has been reduced in TKGI v1.10.0.

Symptom

The TKGI CLI get-credentials occasionally returns the error “od-broker is processing a request for the same instance… please try again later” during periods of intermittent latency.


Certain Linux Nodes Are Unable to Complete the Drain Process during a TKGI Upgrade

This issue is fixed in TKGI v1.10.1.

Symptom

You may experience one or more of the following:

  • Your cluster upgrades are interrupted when upgrading the second node of a three-node Pod.
  • Pod eviction fails when either bosh stop or tkgi upgrade-cluster initiates kubectl drain.

Explanation

Ovs drain has finished, removing Pod networking before kubelet has completely drained the Pod. Draining a worker node has triggered the removal of a container interface on the node, blocking all network traffic from the Pod.


TKGI CLI Resize and Update Cluster Commands Remove Network Profile CNI Configuration from a Cluster

This issue is fixed in TKGI v1.10.1.

Symptom

In TKGI v1.10.0, the network profile CNI configuration is dropped when updating the cluster using the tkgi resize or tkgi update-cluster CLI commands.

Solution

To restore a CNI network profile configuration:

  1. Upgrade TKGI to v1.10.1 or later.
  2. Upgrade your TKGI-provisioned cluster to TKGI v1.10.1 or later.
  3. To restore the network profile CNI configuration to your cluster, modify the cluster using either of the following:

    • The TKGI CLI resize command:

      tkgi resize CLUSTER-NAME
      

      Where CLUSTER-NAME is the name of your cluster.

      • The TKGI CLI update-cluster command:
    • The TKGI CLI update-cluster command:

      tkgi update-cluster CLUSTER-NAME --network-profile PROFILE-NAME
      

      Where:

      • CLUSTER-NAME is the name of your cluster.
      • PROFILE-NAME is the name of the network profile previously applied to the cluster.


Your Cluster Returns the Error 'PodCIDR is Empty for Node’

This issue is fixed in TKGI v1.10.1.

Symptom

While using Antrea as your CNI, your cluster experiences connectivity issues, such as the overlay network failing to communicate with worker nodes. Your logs include errors, such as “PodCIDR is empty for Node…”, even though podCIDR is correctly configured and all Pods in your cluster have a running status.

Explanation

There is a stale IP address in a Pod in your cluster and the current version of Antrea cannot remove it.

Workaround

To remove a stale IP address:

  1. To locate a stale IP address:
    1. Verify the Antrea gateway interface has more than one IP address.
    2. If it does have more than one IP address, review the logs for when the IP addresses are initialized. The last Antrea gateway interface IP address in the log is the only valid IP address for the interface. The preceding IP address(es) are stale and should be removed.
  2. To delete a stale IP address, run the following:

    ip addr del IP-ADDRESS dev GATEWAY-NAME
    

    Where:

    • IP-ADDRESS is the IP address to be removed.
    • GATEWAY-NAME is the name of your Antrea gateway.

    For example:

    ip addr del 10.200.1.1/24 dev2 antrea-gw0
    


Network Profiles Does Not Support the failover_mode Parameter

This issue is fixed in TKGI v1.10.1.

Symptom

When you attempt to resize or update a cluster, the operation fails with logged errors similar to the following:

Error processing update parameters: Unexpected field 'failover_mode'

Explanation

The failover_mode network profile parameter has been incorrectly flagged as an unsupported parameter.

Workaround

To remove the failover_mode parameter from a network profile:

  1. Create a copy of the network profile configuration file used by your cluster.
  2. Remove the failover_mode parameter from the copy of the configuration file. For example, remove:

    "failover_mode":"PREEMPTIVE",
    
  3. Update the network profile for the cluster using the revised configuration file.

You can now resize your cluster.


Windows Pods Are Unable to Complete the Drain Process during a TKGI Upgrade or Update

This issue is fixed in TKGI v1.10.1.

Symptom

While upgrading or updating a Windows worker node, the procedure logs errors similar to the following and never completes:

run cordon_node
node/95eefa70-2028-23vv-8b22-420ece043e21 cordoned
Successfully cordon_node
run drain_kubelet
node/95eefa70-2028-23vv-8b22-420ece043e21 already cordoned
node/95eefa70-2028-23vv-8b22-420ece043e21 drained

Successfully drain_kubelet
checking for attached PVs...
read non-k8s disk info from file
2 disks still attached
2 disks still attached
2 disks still attached

Explanation

The drain process was unsuccessful because Windows Host Compute Service (HCS) created a virtual ephemeral disk for each container and DaemonSet can’t be drained. For more information, see How Daemon Pods are scheduled in the Kubernetes documentation.


The TKGI API Does Not Import the Current TKGI CA Certificates

This issue is fixed in TKGI v1.10.2.

Symptom

After rotating your TKGI CA certificates, you notice the following:

  • When using the TKGI API, SSL authentication returns the following error and fails:

    None of the TrustManagers trust this certificate chain executing POST https://100.104.252.16:8443/oauth/token
    
  • The keystore file at /var/vcap/jobs/pks-api/config/cacerts.jks, contains only a single, stale certificate.

Explanation

The keystore file used by the TKGI API has not been updated with current CA certificates.


Database Cluster Stops After a DB Instance Is Stopped

Symptom

After you stop one instance in a multiple-instance database cluster, the cluster stops, or communication between the remaining databases times out, and the entire cluster becomes unreachable.

The following might be in your UAA log:

WSREP has not yet prepared node for application use

Explanation

The database cluster is unable to recover automatically because a member is no longer available to reconcile quorum.


Windows Workloads With Attached Dynamic PVs Must Be Removed before Deleting a Cluster

This issue is fixed in TKGI v1.10.4.

Symptom

Your tkgi delete-cluster operation hangs while draining a Windows worker VM containing a Pod bound to one or more dynamic persistent volumes (PVs).

Workaround

Before deleting your Windows cluster, remove all workloads.


NSX-T Pre-Check Errand Fails Due to Edge Node CPU Memory Configuration

This issue is fixed in TKGI v1.10.2.

Symptom

You have configured your NSX-T Edge Node VM as medium size, and the NSX-T Pre-Check Errand fails with the following error: “ERROR: NSX-T Precheck failed due to Edge Node … memory is less than 8GB”.

Explanation

The NSX-T Pre-Check Errand has failed because the NSX-T Edge Node has less than 8GB of available memory.


NSX-T Pre-Check Errand Fails Due to Edge Node CPU Count Configuration

This issue is fixed in TKGI v1.8.0 and later and was incorrectly included in the TKGI v1.10 Release Notes.

Symptom

You have configured your NSX-T Edge Node VM as medium size, and the NSX-T Pre-Check Errand fails with the following error: “ERROR: NSX-T Precheck failed due to Edge Node … no of cpu cores is less than 8”.

Explanation

The NSX-T Pre-Check Errand is erroneously returning the “cpu cores is less than 8” error.

Solution

You can safely configure your NSX-T Edge Node VMs as medium size and ignore the error.


Fluent Bit ClusterLogSink Returns 'TCP Connection Failed’ While CEIP Telemetry Services Are Disabled

Symptom

Fluent Bit ClusterLogSink logs the errors “TCP connection failed” and “no upstream connections available” after you disable CEIP Telemetry services:

[2020/10/14 10:38:50] [error] [io] TCP connection failed: telemetry.pks.internal:24224 (Connection timed out)
[2020/10/14 10:38:50] [error] [out_fw] no upstream connections available
[2020/10/14 10:41:08] [error] [io] TCP connection failed: telemetry.pks.internal:24224 (Connection timed out)
[2020/10/14 10:41:08] [error] [out_fw] no upstream connections available
[2020/10/14 10:43:51] [error] [io] TCP connection failed: telemetry.pks.internal:24224 (Connection timed out)
[2020/10/14 10:43:51] [error] [out_fw] no upstream connections available
[2020/10/14 10:43:51] [ warn] [engine] Task cannot be retried: task_id=0 thread_id=2 output=forward.0

Explanation

Fluent Bit intermittently attempts to connect to the Telemetry Server during ClusterLogSink. The connection attempts fail while the CEIP Telemetry services are disabled, and Fluent Bit logs the failed connection attempts.


Cluster Fails to Restart

This issue is fixed in TKGI v1.10.4.

Symptom

A stopped cluster fails to restart and you see the following error in the NSX Container Plug-in (NCP) logs:

Response body {'httpStatus': 'BAD_REQUEST', 'module_name': 'nsx-search', 'error_code': 60513, 'error_message': 'The result set is too large. Please refine the search criteria.'}

Explanation

This problem occurs on clusters with a network policy with more than twelve port-protocol fields.

Workaround

Modify the cluster network policy to have twelve or fewer port-protocol fields.


You Cannot Use vRealize Log Insight (vRLI) to Monitor NCP

This issue is fixed in TKGI v1.10.4.

Symptom

vRLI monitoring does not include NCP stdout or stderror.

Explanation

A bug in the vRLI configuration prevents inclusion of NCP stdout and stderror in vRLI.

Workaround

To write NCP logs for a cluster to vRLI:

  1. Confirm you have admin access to the master node.
  2. Open /var/vcap/jobs/fluentd/config/config.d/nsx-t.conf.
  3. Replace:

    expression /^(?<num>[\w]+) (?<time>[^ ]+) (?<nsx_uuid>[^ ]+) (?<nsx_component>[^ ]+) (?<pid>[^ ]+) - \[.* level="(?<severity>[^\"]+)".*\] (?<message>.+)/
    

    with:

    expression /^(?<time>[^ ]+) (?<nsx_uuid>[^ ]+) (?<nsx_component>[^ ]+) (?<pid>[^ ]+) - \[.* level="(?<severity>[^\"]+)".*\] (?<message>.+)/
    

Note: This workaround affects only the cluster you modified and does not persist.


Creating Two Windows Clusters at the Same Time Fails

Symptom

The first time that you try to create two Windows clusters at the same time, the creation of one of the clusters fails. If you run pks cluster CLUSTER-NAME to examine the last action taken on the cluster, you see the following:

Last Action: Create
Last Action State: failed
Last Action Description: Instance provisioning failed: There was a problem completing your request.
. . .
operation: create, error-message: Failed to acquire lock
. . .
locking task id is 111, description: 'create deployment'

Explanation

This is a known issue that occurs the first time that you create two Windows clusters concurrently.

Workaround

Recreate the failed cluster. This issue only occurs the first time that you create two Windows clusters concurrently.


The TKGI API VM fluentd.stdout.log Grows While vRealize Log Insight is Disabled

This issue is fixed in TKGI v1.10.4.

Symptom

While vRealize Log Insight (vRLI) is disabled, the log for fluentd.stdout on the TKGI API VM grows quite large and rolls over frequently.

Explanation

The fluentd job on the TKGI API VM automatically starts when vRLI is disabled. This fluentd job runs continuously, frequently writing inaccessible hostname errors to stdout for the non-existent vRLI endpoint. The fluentd stdout on the TKGI API VM is logged in: /var/vcap/sys/log/pks-vrli-control-plane-fluentd/fluentd.stdout.log.


Wavefront Collector is Duplicated or Not Collecting from TKGI Namespaces

This issue is fixed in TKGI v1.10.5.

Symptom

After upgrading TKGI to TKGI v1.10.0 or later, you observe that two Wavefront collectors are running instead of one. If one of the collectors stops, the remaining collector alternates between collecting metrics as itself and as the stopped collector. The pks.kubernetes.daemonset. and pks.kubernetes.deployment. namespaces may stop reporting.

Explanation

Two instances of Wavefront collector are currently installed and running: a v1.20 DaemonSet collector and a v1.26 Deployment collector. Both Wavefront collectors are running and collecting metrics. Only the v1.26 Deployment collector instance should be running.

Workaround

You must remove the v1.20 DaemonSet Wavefront collector and leave the v1.26 Deployment collector running.

To remove the DaemonSet Wavefront collector:

  1. Save the DaemonSet YAML creation file as wf_daemonset.yaml. Keep a copy of the file in case you need to restore the DaemonSet Wavefront collector.

  2. To delete the Wavefront Collector DaemonSet:

    kubectl delete daemonset wavefront-collector -n pks-system
    

If you later decide that you need to restore the DaemonSet collector, you can restore using the DaemonSet yaml creation file you saved earlier.

  1. To restore the DaemonSet collector:

    kubectl apply -f wf_daemonset.yaml 
    kubetcl get daemonset/wavefront-collector -n pks-system -o yaml > wf_daemonset.yaml
    


Tags Applied Using tkgi cli Commands Are Removed during tkgi update-cluster

Symptom

The tags you applied to a cluster using the tkgi cli --tags parameter are removed whenever you run tkgi update-cluster.

Workaround

Specify the --tags parameter when updating a cluster using tkgi update-cluster:

  1. To confirm the existing tags on a cluster:

    tkgi cluster CLUSTER-NAME
    

    Where CLUSTER-NAME is the name of the cluster to update.

    Note the tags listed under Tags:.

  2. When running tkgi update-cluster, include the --tags parameter, specifying all of the existing tags on the cluster.


TKGI MC Unable to Manage TKGI after Restoring the TKGI Control Plane from Backup

Symptom

After you restore Ops Manager and the TKGI API VM from backup, TKGI functions normally, but your TKGI MC tabs include the following error: “…product ‘pivotal-container service’ is not deployed…”.

Explanation

TKGI MC is associated with an Ops Manager with a specific name. If you rename Ops Manager with a new name while restoring, your TKGI MC will not recognize the restored Ops Manager and cannot manage it.


SAML Authentication Requests Are Always Signed

Symptom

After you disable the SAML Identity Provider Sign Authentication Requests setting on the Tanzu Kubernetes Grid Integrated Edition tile, SAML IdP authentication requests continue to be signed.


BOSH Director Logs the Error 'Duplicate vm extension name 'disk_enable_uuid’’

Symptom

After you uninstall TKGI, then reinstall TKGI in the same environment, BOSH Director logs errors similar to the following:

.../gems/bosh-director-0.0.0/lib/bosh/director/deployment_plan/cloud_manifest_parser.rb:120:in `parse_vm_extensions': Duplicate vm extension name 'disk_enable_uuid' (Bosh::Director::DeploymentDuplicateVmExtensionName)

Explanation

The pivotal-container-service cloud-config was not removed when you uninstalled the TKGI tile, and it remained active. When you reinstalled the TKGI tile, an additional pivotal-container-service cloud-config was created, causing the metrics_server to fall into a crash-loop state.

Workaround

For more information, see “Duplicate vm extension name 'disk_enable_uuid’” error when metrics_server runs on Director VM in Tanzu Kubernetes Grid Integrated Edition in the VMware Tanzu Community Knowledge Base.


Node Hostnames Are Not Resolved for BYO DNS in Azure Environments

Symptom

If you use a bring-your-own DNS configuration in Azure environments, node hostnames are not resolved on your TKGI clusters, and kubectl returns a “no such host” error:

Error from server: Get "...": dial tcp: lookup ... on ... no such host

Explanation

On Azure, the Azure CPI uses BOSH agent_id GUIDs as the VM name. Additionally, the Azure Kubernetes cloud provider mandates the VM name matches the hostname and that the hostname is resolvable without using search domains.

TKGI on Azure relies on Dynamic DNS updates via DHCP to allow resolvable K8s node names. Environments with a bring-your-own DNS server configuration must provide an alternative method for updating DNS entries.


TKGI Management Console v1.10.0

Release Date: January 28, 2021

Product Snapshot

Note: Tanzu Kubernetes Grid Integrated Edition Management Console provides an opinionated installation of TKGI. The supported versions may differ from or be more limited than what is generally supported by TKGI.

Element Details
Version v1.10.0
Release date January 28, 2021
Installed Tanzu Kubernetes Grid Integrated Edition version v1.10.0
Installed Ops Manager version v2.10.5
Installed Kubernetes version v1.19.6
Installed Harbor Registry version v2.1.2
Linux stemcell v621.97
Windows stemcells v2019.29 or later

Upgrade Path

The supported upgrade path to Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.0 is from Tanzu Kubernetes Grid Integrated Edition v1.9.0 and later.

Features

Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.0 updates include:

  • [Bug Fix] Fixes TKGI Management Console regenerates certificates if the NSX Manager admin password changes.
  • [Bug Fix] The link to the discontinued Kubernetes Dashboard is no longer provided on the TKGI Management Console.
  • [BETA] Supports high availability (HA) mode in Tanzu Kubernetes Grid Integrated Edition Management Console. You can now scale the number of VM instances for the following Tanzu Kubernetes Grid Integrated Edition control plane jobs:

    • Tanzu Kubernetes Grid Integrated Edition API and UAA
    • Tanzu Kubernetes Grid Integrated Edition database
  • Adds support for Antrea CNI when deploying to vSphere without NSX-T networking.

  • Adds support for a No-NAT with virtual switch (VSS/VDS) topology.

  • Adds support for changing the compute profile after cluster creation.

  • Adds support for adding labels and taints to nodes when creating compute profiles.

  • Enforces the vSphere standard for passwords when creating local user accounts.

Known Issues

The Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.0 has the following known issues:


Management Console UI Does Not Open If the Management Console Uses Custom Certificates

This issue is fixed in TKGI MC v1.10.1.

Symptom

If you configure Tanzu Kubernetes Grid Integrated Edition Management console with custom certificates, the management console interface fails to open. This is caused by the failure of the script /etc/vmware/pks-appliance-tls.sh on the management console VM.

Workaround

  1. Use SSH to log in to the management console VM.
  2. Open /etc/vmware/pks-appliance-tls.sh in a text editor.
  3. Replace line 32 with the following code:
    sed -i '/^$/d' $file
  4. Reboot the management console VM.


vRealize Log Insight Integration Does Not Support HTTPS Connections

Symptom

The Tanzu Kubernetes Grid Integrated Edition Management Console integration to vRealize Log Insight does not support connections to the HTTPS port on the vRealize Log Insight server.

Workaround

  1. Use SSH to log in to the Tanzu Kubernetes Grid Integrated Edition Management Console appliance VM.
  2. Open the file /lib/systemd/system/pks-loginsight.service in a text editor.
  3. Add -e LOG_SERVER_ENABLE_SSL_VERIFY=false.
  4. Set -e LOG_SERVER_USE_SSL=true.

    The resulting file should look like the following example:

    ExecStart=/bin/docker run --privileged --restart=always --network=pks
    -v /var/log/journal:/var/log/journal
    --name=pks-loginsight
    -e TYPE=gear2-vm
    -e LOG_SERVER_HOST=${LOGINSIGHT_HOST}
    -e LOG_SERVER_PORT=${LOGINSIGHT_PORT}
    -e LOG_SERVER_ENABLE_SSL_VERIFY=false
    -e LOG_SERVER_USE_SSL=true
    -e LOG_SERVER_AGENT_ID=${LOGINSIGHT_ID}
    pksoctopus/vrli-journald:v07092019
    
  5. Save the file and run systemctl daemon-reload.

  6. To restart the vRealize Log Insight service, run systemctl restart pks-loginsight.service.

Tanzu Kubernetes Grid Integrated Edition Management Console can now send logs to the HTTPS port on the vRealize Log Insight server.


vSphere HA causes Management Console ovfenv Data Corruption

Symptom

If you enable vSphere HA on a cluster, if the TKGI Management Console appliance VM is running on a host in that cluster, and if the host reboots, vSphere HA recreates a new TKGI Management Console appliance VM on another host in the cluster. Due to an issue with vSphere HA, the ovfenv data for the newly created appliance VM is corrupted and the new appliance VM does not boot up with the correct network configuration.

Workaround

  • In the vSphere Client, right-click the appliance VM and select Power > Shut Down Guest OS.
  • Right-click the appliance again and select Edit Settings.
  • Select VM Options and click OK.
  • Verify under Recent Tasks that a Reconfigure virtual machine task has run on the appliance VM.
  • Power on the appliance VM.


Base64 encoded file arguments are not decoded in Kubernetes profiles

Symptom

Some file arguments in Kubernetes profiles are base64 encoded. When the management console displays the Kubernetes profile, some file arguments are not decoded.

Workaround

Run echo "$content" | base64 --decode


Network profiles not immediately selectable

Symptom

If you create network profiles and then try to apply them in the Create Cluster page, the new profiles are not available for selection.

Workaround

Log out of the management console and log back in again.


Real-Time IP information not displayed for network profiles

Symptom

In the cluster summary page, only default IP pool, pod IP block, node IP block values are displayed, rather than the real-time values from the associated network profile.


Error After Modifying Your Harbor Storage Configuration

Symptom

You receive the following error after modifying your existing Harbor installation’s storage configuration:

Error response from daemon: manifest for ... not found: manifest unknown: manifest unknown

Explanation

Harbor does not support modifying an existing Harbor installation’s storage configuration.

Workaround

To modify your Harbor storage configuration, re-install Harbor. Before starting Harbor, configure the new Harbor installation with the desired configuration.


Windows Stemcells Must be Re-Imported After Upgrading Ops Manager

Symptom

After upgrading Ops Manager, your Management Console does not recognize a Windows stemcell imported when using the prior version of Ops Manager.

Workaround

If your Management Console does not recognize a Windows stemcell after upgrading Ops Manager:

  1. Re-import your previously imported Windows stemcell.
  2. Apply Changes to TKGI MC.


Management Console Deletes Custom Workload Configurations

This issue is fixed in TKGI MC v1.10.2.

Symptom

Your Management Console deletes the custom workload configurations that you have added to a Plan Add-ons - Use with caution field.


Please send any feedback you have to pks-feedback@pivotal.io.