About Tanzu Kubernetes Grid Integrated Edition Upgrades

Page last updated:

This topic provides conceptual information about Tanzu Kubernetes Grid Integrated Edition upgrades, including upgrading the TKGI control plane and TKGI-provisioned Kubernetes clusters.

For step-by-step instructions on upgrading Tanzu Kubernetes Grid Integrated Edition and TKGI-provisioned Kubernetes clusters, see:

Overview

An Tanzu Kubernetes Grid Integrated Edition upgrade modifies the version of Tanzu Kubernetes Grid Integrated Edition, for example, from v1.8.x to v1.9.0 or from v1.9.0 to v1.9.1.

By default, Tanzu Kubernetes Grid Integrated Edition is set to perform a full upgrade, which upgrades both the TKGI control plane and all TKGI-provisioned Kubernetes clusters.

However, you can choose to upgrade Tanzu Kubernetes Grid Integrated Edition in two phases by upgrading the TKGI control plane first and then upgrading your TKGI-provisioned Kubernetes clusters later.

Both the full upgrade and the TKGI control plane upgrade are performed through the Tanzu Kubernetes Grid Integrated Edition tile only. When upgrading TKGI-provisioned Kubernetes clusters, you can use either the Tanzu Kubernetes Grid Integrated Edition tile or the TKGI CLI. See the table below.

Upgrade type Upgrade method
TKGI Tile TKGI CLI
Full TKGI upgrade
TKGI control plane only
Kubernetes clusters only

Typically, if you choose to upgrade TKGI-provisioned Kubernetes clusters only, you will upgrade them through the TKGI CLI.

Deciding Between Full and Two-Phase Upgrade

When deciding whether to perform the default full upgrade or to upgrade the TKGI control plane and TKGI-provisioned Kubernetes clusters separately, consider your organization needs.

For example, if your organization runs TKGI-provisioned Kubernetes clusters in both development and production environments and you want to upgrade only one environment first, you can achieve your goal by upgrading the TKGI control plane and TKGI-provisioned Kubernetes separately instead of performing a full upgrade.

Examples of other advantages of upgrading Tanzu Kubernetes Grid Integrated Edition in two phases include:

  • Faster Tanzu Kubernetes Grid Integrated Edition tile upgrades. If you have a large number of clusters in your Tanzu Kubernetes Grid Integrated Edition deployment, performing a full upgrade can significantly increase the amount of time required to upgrade the Tanzu Kubernetes Grid Integrated Edition tile.

  • More granular control over cluster upgrades. In addition to enabling you to upgrade subsets of clusters, the TKGI CLI supports upgrading each cluster individually.

  • Not a monolithic upgrade. This helps isolate the root cause of an error when troubleshooting upgrades. For example, when a cluster-related upgrade error occurs during a full upgrade, the entire Tanzu Kubernetes Grid Integrated Edition tile upgrade may fail.

Warning: If you disable the default full upgrade and upgrade only the TKGI control plane, you must upgrade all your TKGI-provisioned Kubernetes clusters before the next Tanzu Kubernetes Grid Integrated Edition tile upgrade. Disabling the default full upgrade and upgrading only the TKGI control plane cause the TKGI version tagged in your Kubernetes clusters to fall behind the Tanzu Kubernetes Grid Integrated Edition tile version. If your TKGI-provisioned Kubernetes clusters fall more than one version behind the tile, Tanzu Kubernetes Grid Integrated Edition cannot upgrade the clusters.

What Happens During Full TKGI and TKGI Control Plane Upgrades

You can perform full TKGI upgrades and TKGI control plane upgrades only through the Tanzu Kubernetes Grid Integrated Edition tile.

After you add a new Tanzu Kubernetes Grid Integrated Edition tile version to your staging area on the Ops Manager Installation Dashboard, Ops Manager automatically migrates your configuration settings into the new tile version.

For more information, see:

Full TKGI Upgrades

During a full TKGI upgrade, the Tanzu Kubernetes Grid Integrated Edition tile does the following:

  1. Upgrades the TKGI control plane, which includes the TKGI API and UAA servers and the TKGI database. This control plane upgrade causes temporary outages as described in Control Plane Outages below.

  2. Upgrades TKGI-provisioned Kubernetes clusters.

    • Upgrading TKGI-provisioned Kubernetes clusters is controlled by the Upgrade all clusters errand in the Tanzu Kubernetes Grid Integrated Edition tile.
    • The cluster upgrade process recreates all clusters, which may cause cluster outages. For more information, see What Happens During Cluster Upgrades below.

TKGI Control Plane Upgrades

When upgrading the TKGI control plane only, the Tanzu Kubernetes Grid Integrated Edition tile follows the process described in Full TKGI Upgrades above, step 1. It does not upgrade TKGI-provisioned Kubernetes clusters, step 2.

Control Plane Outages

Upgrading the Tanzu Kubernetes Grid Integrated Edition control plane temporarily interrupts the following:

  • Logging in to the TKGI CLI and using all tkgi commands
  • Using the TKGI API to retrieve information about clusters
  • Using the TKGI API to create and delete clusters
  • Using the TKGI API to resize clusters

These outages do not affect the Kubernetes clusters themselves. During a TKGI control plane upgrade, you can still interact with clusters and their workloads using the Kubernetes Command Line Interface, kubectl.

For more information about the TKGI control plane, see TKGI Control Plane Overview in Tanzu Kubernetes Grid Integrated Edition Architecture.

Canary Instances

The Tanzu Kubernetes Grid Integrated Edition tile is a BOSH deployment.

BOSH-deployed products can set a number of canary instances to upgrade first, before the rest of the deployment VMs. BOSH continues the upgrade only if the canary instance upgrade succeeds. If the canary instance encounters an error, the upgrade stops running and other VMs are not affected.

The Tanzu Kubernetes Grid Integrated Edition tile uses one canary instance when deploying or upgrading Tanzu Kubernetes Grid Integrated Edition.

What Happens During Cluster Upgrades

Upgrading TKGI-provisioned Kubernetes clusters updates their Kubernetes version to the version included with the Tanzu Kubernetes Grid Integrated Edition tile. It also updates the TKGI version tagged in your clusters to the Tanzu Kubernetes Grid Integrated Edition tile version.

You can upgrade TKGI-provisioned Kubernetes clusters either through the Tanzu Kubernetes Grid Integrated Edition tile or the TKGI CLI. See the table below.

This method Upgrades
The Upgrade all clusters errand in
the Tanzu Kubernetes Grid Integrated Edition tile > Errands
All clusters. Clusters are upgraded serially.
tkgi upgrade-cluster One cluster.
tkgi upgrade-clusters Multiple clusters. Clusters are upgraded serially or in parallel.

During an upgrade of TKGI-provisioned clusters, Tanzu Kubernetes Grid Integrated Edition recreates your clusters. This includes the following stages for each cluster you upgrade:

  1. Master nodes are recreated.
  2. Worker nodes are recreated.

Depending on your cluster configuration, these recreations may cause Master Nodes Outage or Worker Nodes Outage as described below.

Master Nodes Outage

When Tanzu Kubernetes Grid Integrated Edition upgrades a single-master cluster, you cannot interact with your cluster, use kubectl, or push new workloads.

To avoid this loss of functionality, VMware recommends using multi-master clusters.

Worker Nodes Outage

When Tanzu Kubernetes Grid Integrated Edition upgrades a worker node, the node stops running containers. If your workloads run on a single node, they will experience downtime.

To avoid downtime for stateless workloads, VMware recommends using at least one worker node per availability zone (AZ). For stateful workloads, VMware recommends using a minimum of two worker nodes per AZ.

Note: When the Upgrade all clusters errand is enabled in the Tanzu Kubernetes Grid Integrated Edition tile, updating the tile with a new Linux or Windows stemcell rolls every Linux or Windows VM in each Kubernetes cluster. This automatic rolling ensures that all your VMs are patched. To avoid workload downtime, use the resource configuration recommended in Master Nodes Outage and Worker Nodes Outage above and in Maintaining Workload Uptime.


Please send any feedback you have to pks-feedback@pivotal.io.