Upgrade Preparation Checklist for Enterprise PKS v1.7
Page last updated:
This topic serves as a checklist for preparing to upgrade VMware Enterprise PKS v1.6 to VMware Enterprise PKS v1.7.
This topic lists steps that you must follow before beginning your upgrade. Failure to follow these instructions may jeopardize your existing deployment data and cause the upgrade to fail.
Warning: Upgrading Enterprise PKS to v1.7 reconfigures the PKS control plane by creating a new PKS Database VM and populating it with MySQL data from the original PKS API VM. If your upgrade to Enterprise PKS v1.7 fails, contact Support.
VMware recommends backing up your Enterprise PKS deployment before upgrading. To back up Enterprise PKS, see Backing Up and Restoring Enterprise PKS.
If you have not already done so, review About Enterprise PKS Upgrades.
Plan your upgrade based on your workload capacity and uptime requirements.
Review the Release Notes for Enterprise PKS v1.7.
If your PKS installation was originally deployed using PKS v1.2 or earlier, you need to drop two old Telemetry database tables from the
pivotal-container-service VM before you upgrade to PKS v1.7.
To do this, follow the Resolution instructions in the KB article For environments created before VMware Enterprise PKS 1.3, upgrading to PKS 1.7 fails during data migration when running clone-db errand.
Warning: Neglecting to drop the old Telemetry tables from environments that once ran PKS v1.2 causes PKS v1.7 upgrade to fail.
Coordinate the Enterprise PKS upgrade with cluster admins and users. Tell them that the upgrade reconfigures the control plane and migrates its database. During the upgrade:
Their workloads will remain active and accessible.
They will be unable to perform cluster management functions, including creating, resizing, updating, and deleting clusters.
They will be unable to log in to PKS or use the PKS CLI and other PKS control plane services.
Note: Cluster admins should not start any cluster management tasks right before an upgrade. Wait for cluster operations to complete before upgrading.
Before you upgrade to Enterprise PKS v1.7, you must upgrade all clusters. Doing this aligns the PKS version that the clusters run internally with the current patch version of the PKS control plane.
The PKS control plane supports clusters running the current version and previously-installed version of Enterprise PKS. It does not support clusters running PKS versions older than the last installed version. For example, you can upgrade to PKS v1.7.0 and continue running PKS v1.6.1 clusters.
To check the version of existing clusters and availability of upgrade, run:
To upgrade one or more clusters, see Upgrading Clusters.
It is critical that you confirm that a cluster’s resource usage is within the recommended maximum limits before upgrading the cluster.
VMware Enterprise PKS upgrades a cluster by upgrading master and worker nodes individually. The upgrade processes a master node by redistributing the node’s workload, stopping the node, upgrading it and restoring its workload. This redistribution of a node’s workloads increases the resource usage on the remaining nodes during the upgrade process.
If a Kubernetes cluster master VM is operating too close to capacity, the upgrade can fail.
Warning: Downtime is required to repair a cluster failure resulting from upgrading an overloaded Kubernetes cluster master VM.
To prevent workload downtime during a cluster upgrade, complete the following before upgrading a cluster:
Ensure none of the master VMs being upgraded will become overloaded during the cluster upgrade. See Master Node VM Size for more information.
Review the cluster’s workload resource usage in Dashboard. For more information, see Accessing Dashboard.
Scale up the cluster if it is near capacity on its existing infrastructure. Scale up your cluster by running
pks resizeor create a cluster using a larger plan. For more information, see Changing Cluster Configurations.
Run the cluster’s workloads on at least three worker VMs using multiple replicas of your workloads spread across those VMs. For more information, see Maintaining Workload Uptime.
Verify that your Kubernetes environment is healthy. To verify the health of your Kubernetes environment, see Verifying Deployment Health.
If you are upgrading Enterprise PKS for environments using vSphere with NSX-T, perform the following steps:
- Verify that the vSphere datastores have enough space.
- Verify that the vSphere hosts have enough memory.
- Verify that there are no alarms in vSphere.
- Verify that the vSphere hosts are in a good state.
- Verify that NSX Edge is configured for high availability using Active/Standby mode.
Note: Workloads in your Kubernetes cluster are unavailable while the NSX Edge nodes run the upgrade unless you configure NSX Edge for high availability. For more information, see the Configure NSX Edge for High Availability (HA) section of Preparing NSX-T Before Deploying Enterprise PKS.
Clean up or fix any previous failed attempts to create PKS clusters with the PKS Command Line Interface (PKS CLI) by performing the following steps:
View your deployed clusters by running the following command:
Statusof any cluster displays as
FAILED, continue to the next step. If no cluster displays as
FAILED, no action is required. Continue to the next section.
To troubleshoot and fix failed clusters, perform the procedure in Cluster Creation Fails.
To clean up failed BOSH deployments related to failed clusters, perform the procedure in Cannot Re-Create a Cluster that Failed to Deploy.
After fixing and cleaning up any failed clusters, view your deployed clusters again by running
For more information about troubleshooting and fixing failed clusters, see the Knowledge Base.
Verify that existing Kubernetes clusters have unique external hostnames by checking for multiple Kubernetes clusters with the same external hostname. Perform the following steps:
Log in to the PKS CLI. For more information, see Logging in to Enterprise PKS. You must log in with an account that has the UAA scope of
pks.clusters.admin. For more information about UAA scopes, see Managing Enterprise PKS Users with UAA.
View your deployed PKS clusters by running the following command:
For each deployed cluster, run
pks cluster CLUSTER-NAMEto view the details of the cluster. For example:
$ pks cluster my-cluster
Examine the output to verify that the
Kubernetes Master Hostis unique for each cluster.
Verify your current PKS proxy configuration by performing the following steps:
Check whether an existing proxy is enabled:
- Log in to Ops Manager.
- Click the VMware Enterprise PKS tile.
- Click Networking.
- If HTTP/HTTPS Proxy is Disabled, no action is required. Continue to the next section. If HTTP/HTTPS Proxy is Enabled, continue to the next step.
If the existing No Proxy field contains any of the following values, or you plan to add any of the following values, contact Support:
- Hostnames containing dashes, such as
Enterprise PKS upgrades can run without ever completing if any Kubernetes app has a
maxUnavailable set to
To ensure that no apps have a
maxUnavailable set to
Run the following
kubectlcommand to verify the
PodDisruptionBudgetas the cluster administrator:
kubectl get poddisruptionbudgets --all-namespaces
Examine the output to verify that no app displays
During the Enterprise PKS upgrade process, worker nodes are cordoned and drained. Workloads can prevent worker nodes from draining and cause the upgrade to fail or hang.
To prevent hanging cluster upgrades, you can configure default node drain behavior in Enterprise PKS tile or with the PKS CLI.
The new default behavior takes effect during the next upgrade, not immediately after configuring the behavior.
To configure node drain behavior in the Enterprise PKS tile, see Worker Node Hangs Indefinitely in Troubleshooting.
To configure default node drain behavior with the PKS CLI:
View the current node drain behavior by running the following command:
pks cluster CLUSTER-NAME --details
CLUSTER-NAMEis the name of your cluster.
$ pks cluster my-cluster --details
Name: my-cluster Plan Name: small UUID: f55ed6c4-c0a7-451d-b735-56c89fdb2ad7 Last Action: CREATE Last Action State: succeeded Last Action Description: Instance provisioning completed Kubernetes Master Host: my-cluster.pks.local Kubernetes Master Port: 8443 Worker Nodes: 3 Kubernetes Master IP(s): 10.196.219.88 Network Profile Name: Kubernetes Settings Details: Set by Cluster: Kubelet Node Drain timeout (mins) (kubelet-drain-timeout): 10 Kubelet Node Drain grace-period (mins) (kubelet-drain-grace-period): 10 Kubelet Node Drain force (kubelet-drain-force): true Set by Plan: Kubelet Node Drain force-node (kubelet-drain-force-node): true Kubelet Node Drain ignore-daemonsets (kubelet-drain-ignore-daemonsets): true Kubelet Node Drain delete-local-data (kubelet-drain-delete-local-data): true
Configure the default node drain behavior by running the following command:
pks update-cluster CLUSTER-NAME FLAG
CLUSTER-NAMEis the name of your cluster.
FLAGis an action flag for updating the node drain behavior.
$ pks update-cluster my-cluster --kubelet-drain-timeout 1 --kubelet-drain-grace-period 5
Update summary for cluster my-cluster: Kubelet Drain Timeout: 1 Kubelet Drain Grace Period: 5 Are you sure you want to continue? (y/n): y Use 'pks cluster my-cluster' to monitor the state of your cluster
For a list of the available action flags for setting node drain behavior, see pks update-cluster in PKS CLI.
Please send any feedback you have to email@example.com.