Release Notes

Page last updated:

This topic contains release notes for Tanzu Kubernetes Grid Integrated Edition (TKGI) v1.9.

TKGI v1.9.0

Release Date: September 29, 2020

Product Snapshot

Release Details
Version v1.9.0
Release date September 29, 2020
Component Version
Kubernetes v1.18.8
CoreDNS v1.6.7+vmware.3
Docker Linux: v19.03.5
Windows: v19.03.11
etcd v3.4.3
Metrics Server v0.3.6
NCP v3.0.2.1
Percona XtraDB Cluster (PXC) v0.28.0
UAA v74.5.18
Compatibilities Versions
Ops Manager Ops Manager v2.9.9 or later, or v2.10.1 or later.
Windows worker support on vSphere with NSX-T requires Ops Manager v2.10.1 or later
Xenial stemcells See VMware Tanzu Network
Windows stemcells v2019.24+
Backup and Restore SDK v1.18.0
vSphere v7.0, v6.7, v6.5
VMware Cloud Foundation (VCF) v4.1, v4.0
CNS for vSphere v1.0.2, v2.0
NSX-T v3.0.2, v3.0.1.2 (v3.0.1 EP2), v2.5.2
Harbor v2.1, v2.0.1, v1.10.3
Velero v1.4.2

* See Kubernetes Clusters With Xenial Stemcell v621.85 Fail After Upgrading TKGI on vSphere With NSX-T to TKGI v1.9 and Later.

Upgrade Path

The supported upgrade paths to TKGI v1.9.0 are from Tanzu Kubernetes Grid Integrated Edition v1.8.0 and later.

Features

This section describes new features and changes in VMware Tanzu Kubernetes Grid Integrated Edition v1.9.0.

Windows Workers on NSX-T

TKGI v1.9.0 supports clusters with Windows-based worker nodes on vSphere with NSX-T networking only. TKGI v1.9.0 continues to support Windows workloads on vSphere with Flannel networking as a beta feature.

Cluster Certificate Rotation Support

For secure communication, TKGI clusters use TLS certificates created unique for each cluster. TKGI v1.9.0 integrates with the CredHub Maestro CLI to enable expiry checks and rotation for these cluster-specific certificates, including additional certificates that clusters use with NSX-T networking.

See Rotating Cluster Certificates for how to check and rotate cluster-specific certificates, and TKGI Certificates for managing all certificates used by TKGI.

PKS CLI Supports Certificates Trusted by the Local System

The PKS CLI now trusts the certificates in a system CA store, such as the MacOS keychain. When logging in to the PKS CLI, you no longer need to specify the --skip-ssl-validation or --ca-cert command line arguments.

Compute Profile CLI Support and Improvements (vSphere)

Compute profiles let developers customize cluster topology, node sizing, and other compute resource options, overriding configurations that a cluster inherits from its Plan.

TKGI v1.9.0 redesigns and improves compute profile functionality, and adds TKG CLI options for creating, managing, and using compute profiles.

With NSX-T networking you can use compute profiles with Linux- and Windows-worker clusters. With Flannel networking you can only apply compute profiles to Linux clusters.

For more information, see Creating and Managing Compute Profiles with the CLI (vSphere).

Velero Support and Bundling for Backup and Restore

TKGI v1.9.0 includes support for Velero, an open source community standard tool for backing up and restoring Kubernetes workloads, including stateless and stateful using persistent volumes. Velero is the preferred backup solution for workloads running on TKGI clusters, and is downloadable from your TKGI downloads page on https://my.vmware.com.

For more information, see Backing Up and Restoring Tanzu Kubernetes Grid Integrated Edition.

Resizable Persistent Volume Support on vSphere 7.0

TKGI supports creating resizable persistent volumes on clusters created on vSphere 7.0 with CNS v2.0.

For more information, see Cloud Native Storage (CNS) on vSphere.

Tagging Support on AWS and GCP

TKGI supports tagging from the CLI on Amazon Web Services (AWS) and Google Cloud Platform (GCP). This enables the --tags option to tkgi create-cluster and other commands on all infrastructures.

For more information, see Tagging Clusters.

New Telegraf Configuration Fields

TKGI supports modifying the Telegraf Agent configuration. For more information, see Configure Telegraf in the Tile in Configuring Telegraf in TKGI.

Telemetry Changes

  • Telemetry Enhanced Participation Level Change: The TKGI Customer Experience Improvement Program (CEIP) and Telemetry Program has been streamlined to bring the Enhanced participation level closer to the Standard participation level. For descriptions of the participation levels, see Telemetry.

  • Telemetry Database Removal: The legacy Telemetry DB has been removed from the TKGI Database.

Component Updates

The following components have been updated:

  • Bumps Kubernetes to v1.18.8+vmware.1.
  • Bumps CoreDNS to v1.6.7+vmware.3.
  • Bumps NCP to v3.0.2.1.

Known Issues

TKGI v1.9.0 has the following known issues:

Error: Could Not Execute “Apply-Changes” in Azure Environment

Symptom

After clicking Apply Changes on the TKGI tile in an Azure environment, you experience an error ’…could not execute “apply-changes”…’ with either of the following descriptions:

  • {“errors”:{“base”:[“undefined method ‘location’ for nil:NilClass”]}}
  • FailedError.new(“Resource Groups in region ’#{location}’ do not support Availability Zones”))

For example:

INFO | 2020-09-21 03:46:49 +0000 | Vessel::Workflows::Installer#run | Install product (apply changes)
2020/09/21 03:47:02 could not execute "apply-changes": installation failed to trigger: request failed: unexpected response from /api/v0/installations:
HTTP/1.1 500 Internal Server Error
Transfer-Encoding: chunked
Cache-Control: no-cache, no-store
Connection: keep-alive
Content-Type: application/json; charset=utf-8
Date: Mon, 21 Sep 2020 17:51:50 GMT
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Pragma: no-cache
Referrer-Policy: strict-origin-when-cross-origin
Server: Ops Manager
Strict-Transport-Security: max-age=31536000; includeSubDomains
X-Content-Type-Options: nosniff
X-Download-Options: noopen
X-Frame-Options: SAMEORIGIN
X-Permitted-Cross-Domain-Policies: none
X-Request-Id: f5fc99c1-21a7-45c3-7f39
X-Runtime: 9.905591
X-Xss-Protection: 1; mode=block

44
{"errors":{"base":["undefined method `location' for nil:NilClass"]}}
0

Explanation

The Azure CPI endpoint used by Ops Manager has been changed and your installed version of Ops Manager is not compatible with the new endpoint.

Workaround

Run the following Ops Manager CLI command:

om --skip-ssl-validation --username USERNAME --password PASSWORD --target https://OPSMAN-API curl --silent --path /api/v0/staged/director/verifiers/install_time/IaasConfigurationVerifier -x PUT -d '{ "enabled": false }'

Where:

  • USERNAME is the account to use to run Ops Manager API commands.
  • PASSWORD is the password for the account.
  • OPSMAN-API is the IP address for the Ops Manager API

For more information, see Error 'undefined method location’ is received when running Apply Change on Azure in the VMware Tanzu Knowledge Base.

VMware vRealize Operations Does Not Support Windows Worker-Based Kubernetes Clusters

VMware vRealize Operations (vROPs) does not support Windows worker-based Kubernetes clusters and cannot be used to manage TKGI-provisioned Windows workers.

TKGI Wavefront Does Not Work for Windows Workers

The Wavefront collector and proxy do not support monitoring of clusters with Windows-based worker nodes. For alternative ways to set up in-cluster monitoring, see Monitoring Workers and Workloads.

Pinging Windows Workers Does Not Work

TKGI-provisioned Windows workers inherit a Kubernetes limitation that prevents outbound ICMP communication from workers. As a result, pinging Windows workers does not work.

For information about this limitation, see Limitations > Networking in the Windows in Kubernetes documentation.

Velero Does Not Support Backing Up Stateful Windows Workloads

You can use Velero to backup stateless TKGI-provisioned Windows workers. Velero can back up stateless Windows workloads only, and cannot be used to backup stateful Windows applications. For more information, see Velero on Windows in Basic Install in the Velero documentation.

TMC Integration Not Supported on GCP

TKGI on Google Cloud Platform (GCP) does not support Tanzu Mission Control integration, which is configured in the Tanzu Kubernetes Grid Integrated Edition tile > the Tanzu Mission Control (Experimental) pane.

If you intend to run TKGI v1.9 on GCP, skip this pane when configuring the Tanzu Kubernetes Grid Integrated Edition tile.

Compute Profile CLI Commands Do Not Support Pre-v1.9 Profiles

Compute profiles created in TKGI v1.8 and earlier have a different format from v1.9 compute profiles, as described in Compute Profile CLI Support and Improvements (vSphere).

You cannot use the v1.9 tkgi CLI compute profile commands to manage compute profiles created in prior versions of TKGI. To perform compute profile operations on pre-v1.9 profiles and the clusters that use them, use the curl commands and JSON structure described in Using Compute Profiles in the TKGI v1.8 documentation.

Compute profiles created in prior versions of TKGI, and the clusters that use them, continue to work in v1.9.

Compute Profiles Not Supported with for Windows-Worker Clusters on Flannel

On vSphere with Flannel networking, you can only apply compute profiles to Linux clusters. On vSphere with NSX-T networking you can use compute profiles with both Linux- and Windows-worker clusters.

TKGI CLI Does Not Prevent Reducing the Control Plane Node Count

TKGI CLI does not prevent accidentally reducing a cluster’s control plane node count using a compute profile.

Warning: Reducing a cluster’s control plane node count can destroy the cluster. Do not attempt to use a compute profile to reduce a cluster’s number of control plane nodes, control_plane.instances.

Compute Profile Dropped From Clusters During BOSH Upgrade

If a cluster created using a compute profile is upgraded using tkgi upgrade-cluster, the cluster’s compute profile will be dropped.

Windows Nodes With Workloads and an emptyDir Volume are Unable to Drain

Node draining will fail when scaling down a Windows cluster with a deployed Windows workload and an emptyDir volume.

Solution

Run kubectl drain NODENAME --delete-local-data, then restart the cluster scale down.

In-Cluster DNS Lookup Fails in Windows Clusters

Symptom

DNS lookup fails for Windows Pods that do not use a fully qualified domain name (FQDN) to lookup services and Pods within its namespace or cluster.

Explanation

DNS lookup fails because a Primary DNS suffix has not been configured on the Windows Pod. A Windows Pod configured without a Primary DNS suffix must use a fully qualified domain name (FQDN) to lookup addresses within its namespace and cluster.

Solution

To configure a Primary DNS suffix for your Windows Pods, use a hook to dynamically inject a Primary DNS setting into your Windows Pods. For more information see In-Cluster DNS lookup requires a Fully Qualified Domain Name (FQDN) in Windows Clusters in the VMware Tanzu Community Knowledge Base.

Windows Cluster Creation Fails for Certain Compute Profile Configurations

Windows cluster creation does not support using a compute profile with two or more worker instance groups.

Windows Cluster Nodes Not Deleted After VM Deleted

Symptom

After you delete a VM using your IAAS’ management console you notice a Windows worker node that had been on that VM is now in a notReady state.

Solution

  1. To identify the leftover node:

    kubectl get no -o wide
    
  2. Locate nodes on the returned list that are in a notReady state and have the same IP address as another node in the list.

  3. To manually delete a notReady node:

    kubectl delete node NODE-NAME
    

    Where NODE-NAME is the name of the node in the notReady state.

502 Bad Gateway After OIDC Login

Symptom

You experience a “502 Bad Gateway” error from the NSX load balancer after you log in to OIDC.

Explanation

A large response header has exceeded your NSX-T load balancer maximum response header size. The default maximum response header size is 10,240 characters and should be resized to 50,000.

Workaround

If you experience this issue, manually reconfigure your NSX-T request_header_size and response_header_size to 50,000 characters. For information about configuring NSX-T default header sizes, see OIDC Response Header Overflow in the Knowledge Base.

NSX-T Pre-Check Errand Fails Due to Edge Node Configuration

Symptom

You have configured your NSX-T Edge Node VM as medium size, and the NSX-T Pre-Check Errand fails with the following error: “ERROR: NSX-T Precheck failed due to Edge Node … no of cpu cores is less than 8”.

Explanation

The NSX-T Pre-Check Errand is erroneously returning the “cpu cores is less than 8” error.

Solution

You can safely configure your NSX-T Edge Node VMs as medium size and ignore the error.

Difficulty Changing Proxy for Windows Workers

You must configure a global proxy in the Tanzu Kubernetes Grid Integrated Edition tile > Networking pane before you create any Windows workers that use the proxy.

You cannot change the proxy configuration for Windows workers in an existing cluster.

Character Limitations in HTTP Proxy Password

For vSphere with NSX-T, the HTTP Proxy password field does not support the following special characters: & or ;.

Kubernetes Clusters With Xenial Stemcell v621.85 Fail After Upgrading TKGI on vSphere With NSX-T to TKGI v1.9 and Later

Symptom

After you upgrade to TKGI on vSphere with NSX-T v1.9 and later, TKGI-provisioned Kubernetes clusters with Xenial Stemcell v621.85 fail to start. The cluster start failure log includes the following:

Error: Action Failed get_task: Task ... result: Compiling package openvswitch: 
Running packaging script: Running packaging script: Command exited with 2;

Xenial Stemcell v621.85 is installed by default when installing Ops Manager v2.8.15.

Workaround

If you experience this issue, manually revert the stemcell to an earlier compatible version. For information about reverting stemcells, see How to revert a stemcell in TKGI to prevent OVS compilation issues in the Knowledge Base.

One Plan ID Longer than Other Plan IDs

Symptom

One of your plan IDs is one character longer than your other plan IDs.

Explanation

In TKGI, each plan has a unique plan ID. A plan ID is normally a UUID consisting of 32 alphanumeric characters and 4 hyphens. However, the Plan 4 ID consists of 33 alphanumeric characters and 4 hyphens.

Solution

You can safely configure and use Plan 4. The length of the Plan 4 ID does not affect the functionality of Plan 4 clusters.

If you require all plan IDs to have identical length, do not activate or use Plan 4.

TKGI Management Console 1.9.0

Release Date: September 29, 2020

Features

Tanzu Kubernetes Grid Integrated Edition Management Console v1.9.0 updates include:

Product Snapshot

Note: Tanzu Kubernetes Grid Integrated Edition Management Console provides an opinionated installation of TKGI. The supported versions may differ from or be more limited than what is generally supported by TKGI.

Element Details
Version v1.9.0
Release date September 29, 2020
Installed Tanzu Kubernetes Grid Integrated Edition version v1.9.0
Installed Ops Manager version v2.10.1
Installed Kubernetes version v1.18.8+vmware.1
Compatible NSX-T versions v3.0.2, v3.0.1, 3.0.1.1, v2.5.2
Installed Harbor Registry version v2.0.2
Windows stemcells v2019.24+

Upgrade Path

The supported upgrade path to Tanzu Kubernetes Grid Integrated Edition Management Console v1.9.0 is from Tanzu Kubernetes Grid Integrated Edition v1.8.0 and later.

Known Issues

The Tanzu Kubernetes Grid Integrated Edition Management Console v1.9.0 has the following known issues:

vRealize Log Insight Integration Does Not Support HTTPS Connections

Symptom

The Tanzu Kubernetes Grid Integrated Edition Management Console integration to vRealize Log Insight does not support connections to the HTTPS port on the vRealize Log Insight server.

Workaround

  1. Use SSH to log in to the Tanzu Kubernetes Grid Integrated Edition Management Console appliance VM.
  2. Open the file /lib/systemd/system/pks-loginsight.service in a text editor.
  3. Add -e LOG_SERVER_ENABLE_SSL_VERIFY=false.
  4. Set -e LOG_SERVER_USE_SSL=true.

    The resulting file should look like the following example:

    ExecStart=/bin/docker run --privileged --restart=always --network=pks 
    -v /var/log/journal:/var/log/journal 
    --name=pks-loginsight 
    -e TYPE=gear2-vm 
    -e LOG_SERVER_HOST=${LOGINSIGHT_HOST} 
    -e LOG_SERVER_PORT=${LOGINSIGHT_PORT} 
    -e LOG_SERVER_ENABLE_SSL_VERIFY=false 
    -e LOG_SERVER_USE_SSL=true 
    -e LOG_SERVER_AGENT_ID=${LOGINSIGHT_ID} 
    pksoctopus/vrli-journald:v07092019
    
  5. Save the file and run systemctl daemon-reload.

  6. To restart the vRealize Log Insight service, run systemctl restart pks-loginsight.service.

Tanzu Kubernetes Grid Integrated Edition Management Console can now send logs to the HTTPS port on the vRealize Log Insight server.

vSphere HA causes Management Console ovfenv Data Corruption

Symptom

If you enable vSphere HA on a cluster, if the TKGI Management Console appliance VM is running on a host in that cluster, and if the host reboots, vSphere HA recreates a new TKGI Management Console appliance VM on another host in the cluster. Due to an issue with vSphere HA, the ovfenv data for the newly created appliance VM is corrupted and the new appliance VM does not boot up with the correct network configuration.

Workaround

  • In the vSphere Client, right-click the appliance VM and select Power > Shut Down Guest OS.
  • Right-click the appliance again and select Edit Settings.
  • Select VM Options and click OK.
  • Verify under Recent Tasks that a Reconfigure virtual machine task has run on the appliance VM.
  • Power on the appliance VM.

Base64 encoded file arguments are not decoded in Kubernetes profiles

Symptom

Some file arguments in Kubernetes profiles are base64 encoded. When the management console displays the Kubernetes profile, some file arguments are not decoded.

Workaround

Run echo "$content" | base64 --decode

Network profiles not immediately selectable

Symptom

If you create network profiles and then try to apply them in the Create Cluster page, the new profiles are not available for selection.

Workaround

Log out of the management console and log back in again.

Real-Time IP information not displayed for network profiles

Symptom

In the cluster summary page, only default IP pool, pod IP block, node IP block values are displayed, rather than the real-time values from the associated network profile.

Workaround

None


Please send any feedback you have to pks-feedback@pivotal.io.