Release Notes
Page last updated:
This topic contains release notes for Tanzu Kubernetes Grid Integrated Edition (TKGI) v1.10.
Warning: Before installing or upgrading to Tanzu Kubernetes Grid Integrated Edition v1.10, review the Breaking Changes below.
TKGI v1.10.0
Release Date: January 28, 2021
Product Snapshot
Release | Details |
---|---|
Version | v1.10.0 |
Release date | January 28, 2021 |
Component | Version |
Kubernetes | v1.19.6 |
CoreDNS | v1.7.0+vmware.5 |
Docker |
Linux: v19.03.14 Windows: v19.03.14 |
etcd | v3.4.13 |
Metrics Server | v0.3.6 |
NCP | v3.1.0.1 |
Percona XtraDB Cluster (PXC) | v0.31.0 |
UAA | v74.5.21 |
Compatibilities | Versions |
Ops Manager | Ops Manager 2.10.4 or later, or 2.9.15 or later. End of general support for Ops Manager v2.9 is January 31, 2021. |
Xenial stemcells | See VMware Tanzu Network. |
Windows stemcells | v2019.29+ |
vSphere | See VMware Product Interoperability Matrices. |
VMware Cloud Foundation (VCF) | v4.2.0 |
CNS for vSphere | v1.0.2, v2.0.0 |
NSX-T | v3.0.1, v3.0.2, v3.1.0 |
Harbor | v2.1.1 |
Velero | v1.4.2 |
Upgrade Path
The supported upgrade paths to TKGI v1.10.0 are from Tanzu Kubernetes Grid Integrated Edition v1.9.0 and later.
Features
This section describes new features and changes in VMware Tanzu Kubernetes Grid Integrated Edition v1.10.0.
(Beta) Supports high availability (HA) mode in Tanzu Kubernetes Grid Integrated Edition. You can now scale the number of VM instances for the following Tanzu Kubernetes Grid Integrated Edition control plane jobs:
- Tanzu Kubernetes Grid Integrated Edition API and UAA
- Tanzu Kubernetes Grid Integrated Edition database
To configure HA (beta) for new TKGI v1.10 installations, see Resource Config in the Installing topic for your IaaS. To configure HA (beta) for Tanzu Kubernetes Grid Integrated Edition v1.10 upgrades from v1.9, see Upgrading Enterprise PKS or Upgrading Enterprise PKS with NSX-T and follow the instructions in (Optional) Scale to TKGI High Availability Mode.
Warning: HA mode is a beta feature. Do not scale your TKGI API or TKGI Database to more than one instance in production environments.
Removes Wavefront Alert Recipient, Create pre-defined Wavefront alerts errand, and Delete pre-defined Wavefront alerts errand from the Tanzu Kubernetes Grid Integrated Edition tile. For more information, see TKGI-Defined Wavefront Alerts Removed from the Tile below.
[Enhancement] Reduced the possibility of TKGI CLI
get-credentials
returning the error “od-broker is processing a request for the same instance… please try again later” during periods of intermittent latency.[Enhancement] Swap is now disabled by default. For more information, see Swap Is Disabled by Default in Breaking Changes below.
[Enhancement] Improves error messages for cluster creation, update and upgrade failures. For more information, see Cluster Creation, Update and Upgrade Failure Error Messages No Longer Truncated in Breaking Changes below.
[Security Fix] Passes additional CIS Kubernetes Benchmarks. See TKGI Cluster Benchmarks for details.
[Bug Fix] Fixes Your TKGI Cluster Fails to Start After Changing Your Worker Node’s Compute Profile AZ.
[Bug Fix] Fixes TKGI Upgrade or Install Fails with Error “x509: certificate relies on legacy Common Name field”.
[Bug Fix] Fixes Cluster Creation Fails During the ‘Creating Load Balancer’ Step.
[Bug Fix] Fixes Network Profile Required with Compute Profile.
Cluster-Specific Proxy Settings (NSX-T and AWS)
You can configure proxy settings specific to individual TKGI clusters, overriding the global settings in the TKGI tile > Networking pane. For more information, see Configure Cluster Proxies.
Supports the Antrea CNI
Tanzu Kubernetes Grid Integrated Edition now provides the option to use the Antrea Container Network Interface (CNI) as the CNI for new TKGI-provisioned clusters. For more information about using Antrea as your CNI, see About Upgrading from the Flannel CNI to the Antrea CNI in About Tanzu Kubernetes Grid Integrated Edition Upgrades.
NSX-T Certificate Rotation
You can now rotate TKGI-provisioned Kubernetes cluster NSX-T TLS certificates using a TKGI CLI command. For more information, see Certificate Rotation.
Apply Persistent Node Labels and Node Taints Using Compute Profiles
On vSphere and vSphere with NSX-T, Tanzu Kubernetes Grid Integrated Edition supports applying persistent labels and taints to a Kubernetes node using Compute Profiles.
For more information see node_pools
Block
in Creating and Managing Compute Profiles with the CLI (vSphere).
Windows Worker Kubernetes Clusters Support Active Directory
Windows Server with Active Directory can now control access to TKGI Windows worker-based Kubernetes clusters through integration with group Managed Service Account (gMSA). For more information, see Authenticate Windows Clusters with Active Directory.
TKGI-Defined Wavefront Alerts Removed from the Tile
TKGI v1.10 removes the following configuration options from the Wavefront integration in the tile:
- Create pre-defined Wavefront alerts errand
- Delete pre-defined Wavefront alerts errand
- Wavefront Alert Recipient
If you want to enable pre-defined Wavefront alerts for TKGI v1.10, configure your alert targets in Wavefront. For a list of available alerts, see Predefined Alerts for the Integration.
If you enabled the Create pre-defined Wavefront alerts errand and Wavefront Alert Recipient in an earlier version of TKGI and you upgrade your environment to v1.10, you will continue to receive the TKGI-defined alerts.
Component Updates
The following components have been updated:
- Bumps Kubernetes to v1.19.4.
- Bumps Xenial stemcell to v621.94.
- Bumps NCP to v3.1.0.17170700.
- Bumps PXC to v0.31.0.
- Bumps UAA to v74.5.21.
Breaking Changes
TKGI v1.10.0 has the following breaking changes.
TKGI v1.10 Is Not Compatible with NSX-T v2.5.2 or Earlier
TKGI v1.10 is not compatible with NSX-T v2.5.2 or earlier. If you are deploying TKGI v1.10 to NSX-T, your NSX-T version must be NSX-T v3.0.1 or later. For more information about upgrading NSX-T and TKGI, see Upgrade Order for Tanzu Kubernetes Grid Integrated Edition Environments on vSphere and Upgrading Tanzu Kubernetes Grid Integrated Edition (NSX-T Networking).
Swap Is Disabled by Default
Swap is now disabled on all worker nodes. In previous versions of Tanzu Kubernetes Grid Integrated Edition, Swap was enabled, but upstream Kubernetes does not support this setting. You cannot enable swap through the TKGI CLI, and manually configuring swap is not permitted.
Cluster Creation, Update and Upgrade Failure Error Messages No Longer Truncated
Tanzu Kubernetes Grid Integrated Edition v1.10 includes improved error messages for cluster creation, update and upgrade failures. Previously, error messages greater than 128 bytes were truncated. In TKGI v1.10 logged cluster creation and upgrade failure error messages are no longer truncated.
Known Issues
TKGI v1.10.0 has the following known issues:
Pods Stop After Upgrading From NSX-T v3.0.2 to v3.1.0 on vSphere 7.0 and 7.0.1
Symptom
Your TKGI-provisioned Pods stop after upgrading from NSX-T v3.0.2 to NSX-T v3.1.0 on vSphere 7.0 and 7.0.1.
Explanation
For information, see Issue 2603550: Some VMs are vMotioned and lose network connectivity during UA nodes upgrade in the VMware NSX-T Data Center 3.1.1 Release Notes.
Workaround
To avoid the loss of network connectivity during UA node upgrade, ensure DRS is set to manual mode during your upgrade from NSX-T v3.0.2 to v3.1.0.
If you upgraded to NSX-T v3.1.0 with DRS in automation mode, run the following on the affected Pods’ master VMs to restore Pod connectivity:
monit restart ncp
For more information on upgrading NSX-T v3.0.2 to NSX-T v3.1.0, see Upgrade NSX-T Data Center to v3.0 or v3.1.
Error: Could Not Execute “Apply-Changes” in Azure Environment
Symptom
After clicking Apply Changes on the TKGI tile in an Azure environment, you experience an error ’…could not execute “apply-changes”…’ with either of the following descriptions:
- {“errors”:{“base”:[“undefined method 'location’ for nil:NilClass”]}}
- FailedError.new(“Resource Groups in region ’#{location}’ do not support Availability Zones”))
For example:
INFO | 2020-09-21 03:46:49 +0000 | Vessel::Workflows::Installer#run | Install product (apply changes)
2020/09/21 03:47:02 could not execute "apply-changes": installation failed to trigger: request failed: unexpected response from /api/v0/installations:
HTTP/1.1 500 Internal Server Error
Transfer-Encoding: chunked
Cache-Control: no-cache, no-store
Connection: keep-alive
Content-Type: application/json; charset=utf-8
Date: Mon, 21 Sep 2020 17:51:50 GMT
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Pragma: no-cache
Referrer-Policy: strict-origin-when-cross-origin
Server: Ops Manager
Strict-Transport-Security: max-age=31536000; includeSubDomains
X-Content-Type-Options: nosniff
X-Download-Options: noopen
X-Frame-Options: SAMEORIGIN
X-Permitted-Cross-Domain-Policies: none
X-Request-Id: f5fc99c1-21a7-45c3-7f39
X-Runtime: 9.905591
X-Xss-Protection: 1; mode=block
44
{"errors":{"base":["undefined method `location' for nil:NilClass"]}}
0
Explanation
The Azure CPI endpoint used by Ops Manager has been changed and your installed version of Ops Manager is not compatible with the new endpoint.
Workaround
Run the following Ops Manager CLI command:
om --skip-ssl-validation --username USERNAME --password PASSWORD --target https://OPSMAN-API curl --silent --path /api/v0/staged/director/verifiers/install_time/IaasConfigurationVerifier -x PUT -d '{ "enabled": false }'
Where:
USERNAME
is the account to use to run Ops Manager API commands.PASSWORD
is the password for the account.OPSMAN-API
is the IP address for the Ops Manager API
For more information, see Error 'undefined method location’ is received when running Apply Change on Azure in the VMware Tanzu Knowledge Base.
VMware vRealize Operations Does Not Support Windows Worker-Based Kubernetes Clusters
VMware vRealize Operations (vROPs) does not support Windows worker-based Kubernetes clusters and cannot be used to manage TKGI-provisioned Windows workers.
TKGI Wavefront Requires Manual Installation for Windows Workers
To monitor Windows-based worker node clusters with a Wavefront collector and proxy, you must first install Wavefront on the clusters maunually, using Helm. For instructions, see the Wavefront section of the Monitoring Windows Worker Clusters and Nodes topic.
Pinging Windows Worker Kubernetes Clusters Does Not Work
TKGI-provisioned Windows worker-based Kubernetes clusters inherit a Kubernetes limitation that prevents outbound ICMP communication from workers. As a result, pinging Windows workers does not work.
For information about this limitation, see Limitations > Networking in the Windows in Kubernetes documentation.
Velero Does Not Support Backing Up Stateful Windows Workloads
You can use Velero to backup stateless TKGI-provisioned Windows workers. Velero can back up stateless Windows workloads only, and cannot be used to backup stateful Windows applications. For more information, see Velero on Windows in Basic Install in the Velero documentation.
Tanzu Mission Control Integration Not Supported on GCP
TKGI on Google Cloud Platform (GCP) does not support Tanzu Mission Control (TMC) integration, which is configured in the Tanzu Kubernetes Grid Integrated Edition tile > the Tanzu Mission Control (Experimental) pane.
If you intend to run TKGI v1.9 on GCP, skip this pane when configuring the Tanzu Kubernetes Grid Integrated Edition tile.
TMC Data Protection Feature Requires Privileged TKGI Containers
TMC Data Protection feature supports privileged TKGI containers only. For more information, see Plans in the Installing TKGI topic for your IaaS.
Windows Worker Kubernetes Clusters with Group Managed Service Account Do Not Support Compute Profiles
Windows worker-based Kubernetes clusters integrated with group Managed Service Account (gMSA) cannot be managed using Compute Profiles.
Windows Worker Kubernetes Clusters on Flannel Do Not Support Compute Profiles
On vSphere with NSX-T networking you can use compute profiles with both Linux and Windows worker‑based Kubernetes clusters. On vSphere with Flannel networking, you can apply compute profiles only to Linux clusters.
TKGI Does Not Support Managing Pre-TKGI v1.9 Compute Profiles
Compute profiles created in TKGI v1.8 and earlier have a different format from current compute profiles.
TKGI v1.10 does not support resizing, updating, upgrading, or
managing compute profiles using tkgi
CLI compute profile commands,
on a cluster that has a compute profile created in TKGI v1.8 and earlier.
TKGI CLI Does Not Prevent Reducing the Control Plane Node Count
TKGI CLI does not prevent accidentally reducing a cluster’s control plane node count using a compute profile.
Warning: Reducing a cluster’s control plane node count can destroy the cluster. Do not scale out or scale in existing master nodes by reconfiguring the TKGI tile or by using a compute profile. Reducing a cluster’s number of control plane nodes may remove a master node and cause the cluster to become inactive.
Compute Profile Dropped From Clusters During BOSH Upgrade
If a cluster created using a compute profile is upgraded using tkgi upgrade-cluster
,
the cluster’s compute profile will be dropped.
Windows Cluster Nodes Not Deleted After VM Deleted
Symptom
After you delete a VM using your IAAS’ management console you notice a Windows worker node
that had been on that VM is now in a notReady
state.
Solution
To identify the leftover node:
kubectl get no -o wide
Locate nodes on the returned list that are in a
notReady
state and have the same IP address as another node in the list.To manually delete a
notReady
node:kubectl delete node NODE-NAME
Where
NODE-NAME
is the name of the node in thenotReady
state.
502 Bad Gateway After OIDC Login
Symptom
You experience a “502 Bad Gateway” error from the NSX load balancer after you log in to OIDC.
Explanation
A large response header has exceeded your NSX-T load balancer maximum response header size. The default maximum response header size is 10,240 characters and should be resized to 50,000.
Workaround
If you experience this issue, manually reconfigure your NSX-T request_header_size
and response_header_size
to 50,000 characters.
For information about configuring NSX-T default header sizes,
see OIDC Response Header Overflow in the Knowledge Base.
NSX-T Pre-Check Errand Fails Due to Edge Node Configuration
Symptom
You have configured your NSX-T Edge Node VM as medium
size,
and the NSX-T Pre-Check Errand fails with the following error:
“ERROR: NSX-T Precheck failed due to Edge Node … no of cpu cores is less than 8”.
Explanation
The NSX-T Pre-Check Errand is erroneously returning the “cpu cores is less than 8” error.
Solution
You can safely configure your NSX-T Edge Node VMs as medium
size and ignore the error.
Difficulty Changing Proxy for Windows Workers
You must configure a global proxy in the Tanzu Kubernetes Grid Integrated Edition tile > Networking pane before you create any Windows workers that use the proxy.
You cannot change the proxy configuration for Windows workers in an existing cluster.
Character Limitations in HTTP Proxy Password
For vSphere with NSX-T, the HTTP Proxy password field does not support the following special characters: &
or ;
.
Error After Modifying Your Harbor Storage Configuration
Symptom
You receive the following error after modifying your existing Harbor installation’s storage configuration:
Error response from daemon: manifest for ... not found: manifest unknown: manifest unknown
Explanation
Harbor does not support modifying an existing Harbor installation’s storage configuration.
Workaround
To modify your Harbor storage configuration, re-install Harbor. Before starting Harbor, configure the new Harbor installation with the desired configuration.
Unexplained Errors After Interrupting a Log Stream When Using Antrea Networking
Symptom
While using Antrea networking, you observe unexplainable errors after you interrupt a log stream started using kubectl logs -f POD-NAME
.
The errors could include any of the following:
- kubectl returns the error: “Error from server (TooManyRequests): the server has received too many”.
kube-apiserver
returns an http code429
.
Explanation
When using Antrea networking there is a chance that konnectivity-agent
will become unstable after interrupting your kubectl log steam.
Workaround
To resolve the issue:
Log in to the master VM:
bosh -d DEPLOYMENT-NAME ssh master/0
Change to root:
sudo -i
Restart
proxy-server
:monit restart proxy-server
Wait for
proxy-server
restart:monit summary
Ingress Controller Statefulset Fails to Start After Resizing Worker Nodes
Symptom
Permissions are removed from your cluster’s files and processes after resizing the persistent disk during a cluster upgrade. The ingress controller statefulset fails to start.
Explanation
When resizing a persistent disk, Bosh migrates the data from the old disk to the new disk but does not copy the files’ extended attributes.
Workaround
To resolve the problem, complete the steps in Ingress controller statefulset fails to start after resize of worker nodes with permission denied in the VMware Tanzu Knowledge Base.
One Plan ID Longer than Other Plan IDs
Symptom
One of your plan IDs is one character longer than your other plan IDs.
Explanation
In TKGI, each plan has a unique plan ID. A plan ID is normally a UUID consisting of 32 alphanumeric characters and 4 hyphens. However, the Plan 4 ID consists of 33 alphanumeric characters and 4 hyphens.
Solution
You can safely configure and use Plan 4. The length of the Plan 4 ID does not affect the functionality of Plan 4 clusters.
If you require all plan IDs to have identical length, do not activate or use Plan 4.
TKGI Management Console 1.10.0
Release Date: January 28, 2021
Features
Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.0 updates include:
- [Bug Fix] Fixes TKGI Management Console regenerates certificates if the NSX Manager admin password changes.
[BETA] Supports high availability (HA) mode in Tanzu Kubernetes Grid Integrated Edition Management Console. You can now scale the number of VM instances for the following Tanzu Kubernetes Grid Integrated Edition control plane jobs:
- Tanzu Kubernetes Grid Integrated Edition API and UAA
- Tanzu Kubernetes Grid Integrated Edition database
Adds support for Antrea CNI when deploying to vSphere without NSX-T networking.
Adds support for a No-NAT with virtual switch (VSS/VDS) topology.
Adds support for changing the compute profile after cluster creation.
Adds support for adding labels and taints to nodes when creating compute profiles.
Enforces the vSphere standard for passwords when creating local user accounts.
Product Snapshot
Note: Tanzu Kubernetes Grid Integrated Edition Management Console provides an opinionated installation of TKGI. The supported versions may differ from or be more limited than what is generally supported by TKGI.
Element | Details |
---|---|
Version | v1.10.0 |
Release date | January 28, 2021 |
Installed Tanzu Kubernetes Grid Integrated Edition version | v1.10.0 |
Installed Ops Manager version | 2.10.5 |
Installed Kubernetes version | 1.19.6 |
Compatible NSX-T versions | v3.0.1.2, v3.0.2, v3.1 |
Installed Harbor Registry version | 2.1.2 |
Linux stemcell | 621.97 |
Windows stemcells | >=2019.29 |
Upgrade Path
The supported upgrade path to Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.0 is from Tanzu Kubernetes Grid Integrated Edition v1.9.0 and later.
Known Issues
The Tanzu Kubernetes Grid Integrated Edition Management Console v1.10.0 has the following known issues:
Management console UI does not open if the management console uses custom certificates
Symptom
If you configure Tanzu Kubernetes Grid Integrated Edition Management console with custom certificates, the management console interface fails to open. This is caused by the failure of the script /etc/vmware/pks-appliance-tls.sh
on the management console VM.
Workaround
- Use SSH to log in to the management console VM.
- Open
/etc/vmware/pks-appliance-tls.sh
in a text editor. - Replace line 32 with the following code:
sed -i '/^$/d' $file
- Reboot the management console VM.
vRealize Log Insight Integration Does Not Support HTTPS Connections
Symptom
The Tanzu Kubernetes Grid Integrated Edition Management Console integration to vRealize Log Insight does not support connections to the HTTPS port on the vRealize Log Insight server.
Workaround
- Use SSH to log in to the Tanzu Kubernetes Grid Integrated Edition Management Console appliance VM.
- Open the file
/lib/systemd/system/pks-loginsight.service
in a text editor. - Add
-e LOG_SERVER_ENABLE_SSL_VERIFY=false
. Set
-e LOG_SERVER_USE_SSL=true
.The resulting file should look like the following example:
ExecStart=/bin/docker run --privileged --restart=always --network=pks -v /var/log/journal:/var/log/journal --name=pks-loginsight -e TYPE=gear2-vm -e LOG_SERVER_HOST=${LOGINSIGHT_HOST} -e LOG_SERVER_PORT=${LOGINSIGHT_PORT} -e LOG_SERVER_ENABLE_SSL_VERIFY=false -e LOG_SERVER_USE_SSL=true -e LOG_SERVER_AGENT_ID=${LOGINSIGHT_ID} pksoctopus/vrli-journald:v07092019
Save the file and run
systemctl daemon-reload
.To restart the vRealize Log Insight service, run
systemctl restart pks-loginsight.service
.
Tanzu Kubernetes Grid Integrated Edition Management Console can now send logs to the HTTPS port on the vRealize Log Insight server.
vSphere HA causes Management Console ovfenv Data Corruption
Symptom
If you enable vSphere HA on a cluster, if the TKGI Management Console appliance VM is running on a host in that cluster, and if the host reboots, vSphere HA recreates a new TKGI Management Console appliance VM on another host in the cluster. Due to an issue with vSphere HA, the ovfenv
data for the newly created appliance VM is corrupted and the new appliance VM does not boot up with the correct network configuration.
Workaround
- In the vSphere Client, right-click the appliance VM and select Power > Shut Down Guest OS.
- Right-click the appliance again and select Edit Settings.
- Select VM Options and click OK.
- Verify under Recent Tasks that a
Reconfigure virtual machine
task has run on the appliance VM. - Power on the appliance VM.
Base64 encoded file arguments are not decoded in Kubernetes profiles
Symptom
Some file arguments in Kubernetes profiles are base64 encoded. When the management console displays the Kubernetes profile, some file arguments are not decoded.
Workaround
Run echo "$content" | base64 --decode
Network profiles not immediately selectable
Symptom
If you create network profiles and then try to apply them in the Create Cluster page, the new profiles are not available for selection.
Workaround
Log out of the management console and log back in again.
Real-Time IP information not displayed for network profiles
Symptom
In the cluster summary page, only default IP pool, pod IP block, node IP block values are displayed, rather than the real-time values from the associated network profile.
Workaround
None
Error After Modifying Your Harbor Storage Configuration
Symptom
You receive the following error after modifying your existing Harbor installation’s storage configuration:
Error response from daemon: manifest for ... not found: manifest unknown: manifest unknown
Explanation
Harbor does not support modifying an existing Harbor installation’s storage configuration.
Workaround
To modify your Harbor storage configuration, re-install Harbor. Before starting Harbor, configure the new Harbor installation with the desired configuration.
Please send any feedback you have to pks-feedback@pivotal.io.