Upgrade Preparation Checklist for Enterprise PKS v1.5

Page last updated:

Warning: Pivotal Container Service (PKS) v1.5 is no longer supported because it has reached the End of General Support (EOGS) phase as defined by the Support Lifecycle Policy.
To stay up to date with the latest software and security updates, upgrade to a supported version.

This topic serves as a checklist for preparing to upgrade Enterprise Pivotal Container Service (Enterprise PKS) v1.4 to Enterprise Pivotal Container Service (Enterprise PKS) v1.5.

This topic contains important preparation steps that you must follow before beginning your upgrade. Failure to follow these instructions may jeopardize your existing deployment data and cause the upgrade to fail.

After completing the steps in this topic, you can continue to Upgrading Enterprise PKS. If you are upgrading Enterprise PKS for environments using vSphere with NSX-T, continue to Upgrading Enterprise PKS with NSX-T.

Note: Cluster users should not start any cluster management tasks right before an upgrade. Wait for cluster operations to complete before upgrading.

Back Up Your Enterprise PKS Deployment

We recommend backing up your Enterprise PKS deployment before upgrading. To back up Enterprise PKS, see Backing Up and Restoring Enterprise PKS.

Note: If you choose not to back up Enterprise PKS, NSX-T, or vCenter, we recommend backing up the NSX-T and NSX-T Container Plugin (NCP) logs.

Review Changes in Enterprise PKS v1.5

Review the Release Notes for Enterprise PKS v1.5.

Configure RBAC for PSPs

Enterprise PKS includes the Pod Security Policies (PSPs) security feature. For more information about PSPs in Enterprise PKS, see the Pod Security Policy.

When you install or upgrade Enterprise PKS, PSPs are not enabled by default. If you want to enable PSPs for a new or existing cluster, you must define the necessary RBAC objects in a role and role binding and PSP before you deploy the cluster. If you do not define these objects users will not be able to access the Kubernetes cluster after deployment. See Enabling PSPs for more information.

Enterprise PKS provides a PSP named pks-restricted that you use. You can define your own PSPs or use one of the cluster roles already in the system. At a minimum, you need to define the proper cluster role binding so that users can access a cluster with PSPs enabled. See Configuring PSP for Developers to Use for more information.

Configure OIDC Prefixes

In Enterprise PKS v1.5, you can configure prefixes for OpenID Connect (OIDC) users and groups. You can use these prefixes to avoid name conflicts with existing Kubernetes system users. Pivotal recommends adding prefixes to ensure OIDC users and groups do not gain unintended privileges on clusters. For instructions about configuring OIDC prefixes, see the Configure OpenID Connect section in the Installing topic for your IaaS.

If you add OIDC prefixes you must manually change any existing roles and role bindings that bind to a username or group. If you do not change your role and role bindings, developers cannot access Kubernetes clusters. For instructions about creating a role and role binding, see Managing Cluster Access and Permissions.

Understand What Happens During Enterprise PKS Upgrades

Review What Happens During Enterprise PKS Upgrades, and evaluate your workload capacity and uptime requirements.

Set User Expectations and Restrict Cluster Access

Coordinate the Enterprise PKS upgrade with cluster admins and users. During the upgrade:

  • Their workloads will remain active and accessible.

  • They will be unable to perform cluster management functions, including creating, resizing, updating, and deleting clusters.

  • They will be unable to log in to PKS or use the PKS CLI and other PKS control plane services.

Note: Cluster admins should not start any cluster management tasks right before an upgrade. Wait for cluster operations to complete before upgrading.

Upgrade All Clusters

Before you upgrade to Enterprise PKS v1.5, you must upgrade all clusters. Doing this aligns the PKS version that the clusters run internally with the current patch version of the PKS control plane.

The PKS control plane supports clusters running the current version and previously-installed version of Enterprise PKS. It does not support clusters running PKS versions older than the last installed version. For example, you can upgrade to PKS v1.5.0 and continue running PKS v1.4 clusters.

To check the version of existing clusters and availability of upgrade, run:

pks clusters

To upgrade one or more clusters, see Upgrading Clusters.

Verify Your Clusters Support Upgrading

It is critical that you confirm that a cluster’s resource usage is within the recommended maximum limits before upgrading the cluster.

Enterprise PKS upgrades a cluster by upgrading master and worker nodes individually. The upgrade processes a master node by redistributing the node’s workload, stopping the node, upgrading it and restoring its workload. This redistribution of a node’s workloads increases the resource usage on the remaining nodes during the upgrade process.

If a Kubernetes cluster master VM is operating too close to capacity, the upgrade can fail.

Warning: Downtime is required to repair a cluster failure resulting from upgrading an overloaded Kubernetes cluster master VM.

To prevent workload downtime during a cluster upgrade, complete the following before upgrading a cluster:

  1. Ensure none of the master VMs being upgraded will become overloaded during the cluster upgrade. See Master Node VM Size for more information.

  2. Review the cluster’s workload resource usage in Dashboard. For more information, see Accessing Dashboard.

  3. Scale up the cluster if it is near capacity on its existing infrastructure. Scale up your cluster by running pks resize or create a cluster using a larger plan. For more information, see Changing Cluster Configurations.

  4. Run the cluster’s workloads on at least three worker VMs using multiple replicas of your workloads spread across those VMs. For more information, see Maintaining Workload Uptime.

Verify Health of Kubernetes Environment

Verify that your Kubernetes environment is healthy. To verify the health of your Kubernetes environment, see Verifying Deployment Health.

Verify NSX-T Configuration (vSphere with NSX-T Only)

If you are upgrading Enterprise PKS for environments using vSphere with NSX-T, perform the following steps:

  1. Verify that the vSphere datastores have enough space.
  2. Verify that the vSphere hosts have enough memory.
  3. Verify that there are no alarms in vSphere.
  4. Verify that the vSphere hosts are in a good state.
  5. Verify that NSX Edge is configured for high availability using Active/Standby mode.

    Note: Workloads in your Kubernetes cluster are unavailable while the NSX Edge nodes run the upgrade unless you configure NSX Edge for high availability. For more information, see the Configure NSX Edge for High Availability (HA) section of Preparing NSX-T Before Deploying Enterprise PKS.

Clean Up Failed Kubernetes Clusters

Clean up previous failed attempts to delete PKS clusters with the PKS Command Line Interface (PKS CLI) by performing the following steps:

  1. View your deployed clusters by running the following command:

    pks clusters
    

    If the Status of any cluster displays as FAILED, continue to the next step. If no cluster displays as FAILED, no action is required. Continue to the next section.

  2. Perform the procedures in Cannot Re-Create a Cluster that Failed to Deploy to clean up the failed BOSH deployment.

  3. View your deployed clusters again by running pks clusters. If any clusters remain in a FAILED state, contact PKS Support.

Verify Kubernetes Clusters Have Unique External Hostnames

Verify that existing Kubernetes clusters have unique external hostnames by checking for multiple Kubernetes clusters with the same external hostname. Perform the following steps:

  1. Log in to the PKS CLI. For more information, see Logging in to Enterprise PKS. You must log in with an account that has the UAA scope of pks.clusters.admin. For more information about UAA scopes, see Managing Enterprise PKS Users with UAA.

  2. View your deployed PKS clusters by running the following command:

    pks clusters
    
  3. For each deployed cluster, run pks cluster CLUSTER-NAME to view the details of the cluster. For example:

    $ pks cluster my-cluster
    

    Examine the output to verify that the Kubernetes Master Host is unique for each cluster.

Verify PKS Proxy Configuration

Verify your current PKS proxy configuration by performing the following steps:

  1. Check whether an existing proxy is enabled:

    1. Log in to Ops Manager.
    2. Click the Pivotal Container Service tile.
    3. Click Networking.
    4. If HTTP/HTTPS Proxy is Disabled, no action is required. Continue to the next section. If HTTP/HTTPS Proxy is Enabled, continue to the next step.
  2. If the existing No Proxy field contains any of the following values, or you plan to add any of the following values, contact PKS Support:

    • localhost
    • Hostnames containing dashes, such as my-host.mydomain.com

Check PodDisruptionBudget Value

Enterprise PKS upgrades can run without ever completing if any Kubernetes app has a PodDisruptionBudget with maxUnavailable set to 0. To ensure that no apps have a PodDisruptionBudget with maxUnavailable set to 0, perform the following steps:

  1. Use the Kubernetes CLI, kubectl, to verify the PodDisruptionBudget as the cluster administrator. Run the following command:

    kubectl get poddisruptionbudgets --all-namespaces
    
  2. Examine the output. Verify that no app displays 0 in the MAX UNAVAILABLE column.

Configure Node Drain Behavior

During the Enterprise PKS tile upgrade process, worker nodes are cordoned and drained. Workloads can prevent worker nodes from draining and cause the upgrade to fail or hang.

To prevent hanging cluster upgrades, you can use the PKS CLI to configure the default node drain behavior. The new default behavior takes effect during the next upgrade, not immediately after configuring the behavior.

Configure with the PKS CLI

To configure default node drain behavior, do the following:

  1. View the current node drain behavior by running the following command:

    pks cluster CLUSTER-NAME --details
    

    Where CLUSTER-NAME is the name of your cluster.

    For example:

    $ pks cluster my-cluster --details 
    Name: my-cluster Plan Name: small UUID: f55ed6c4-c0a7-451d-b735-56c89fdb2ad7 Last Action: CREATE Last Action State: succeeded Last Action Description: Instance provisioning completed Kubernetes Master Host: my-cluster.pks.local Kubernetes Master Port: 8443 Worker Nodes: 3 Kubernetes Master IP(s): 10.196.219.88 Network Profile Name: Kubernetes Settings Details: Set by Cluster: Kubelet Node Drain timeout (mins) (kubelet-drain-timeout): 10 Kubelet Node Drain grace-period (mins) (kubelet-drain-grace-period): 10 Kubelet Node Drain force (kubelet-drain-force): true Set by Plan: Kubelet Node Drain force-node (kubelet-drain-force-node): true Kubelet Node Drain ignore-daemonsets (kubelet-drain-ignore-daemonsets): true Kubelet Node Drain delete-local-data (kubelet-drain-delete-local-data): true

  2. Configure the default node drain behavior by running the following command:

    pks update-cluster CLUSTER-NAME FLAG
    

    Where:

    • CLUSTER-NAME is the name of your cluster.
    • FLAG is an action flag for updating the node drain behavior.

    For example:

    $ pks update-cluster my-cluster --kubelet-drain-timeout 1 --kubelet-drain-grace-period 5
    Update summary for cluster my-cluster: Kubelet Drain Timeout: 1 Kubelet Drain Grace Period: 5 Are you sure you want to continue? (y/n): y Use 'pks cluster my-cluster' to monitor the state of your cluster

    For a list of the available action flags for setting node drain behavior, see pks update-cluster in PKS CLI.

Update Your Role for the Worker Node Managed Identity on Azure

If you are running Enterprise PKS on Azure, you must add the "Microsoft.Compute/virtualMachines/read" action to the worker node managed identity.

Note: You do not need to modify the worker node managed identity role if you are running Enterprise PKS on AWS, GCP, vSphere, or vSphere with NSX-T. Modifying the role for Azure is a requirement as of Kubernetes v1.14.5.

To add the "Microsoft.Compute/virtualMachines/read" action, do the following:

  1. List your roles using the Azure CLI. For example:

    $ az role definition list --custom-role-only true -o json
    
  2. Retrieve the definition of the "PKS worker" role using the roleName key. For example:

    $ az role definition list --custom-role-only true -o json | jq -r '.[] | select(.roleName=="PKS worker")'
    
  3. Copy the JSON to a file and add "Microsoft.Compute/virtualMachines/read" under "Actions".

  4. Save your template as pks_worker_role.json.

  5. Update the role:

    az role definition update --role-definition pks_worker_role.json
    

For more information about creating managed identities for Enterprise PKS, see Creating Managed Identities in Azure for Enterprise PKS.


Please send any feedback you have to pks-feedback@pivotal.io.