Enterprise PKS Release Notes

Page last updated:

This topic contains release notes for Enterprise Pivotal Container Service (Enterprise PKS) v1.5.0.

v1.5.0

Release Date: August 20, 2019

Features

New features and changes in this release:

  • Cluster administrators can use the pks cluster CLUSTER-NAME --details command to view details about an individual cluster, including Kubernetes nodes and NSX-T network details. For more information, see Viewing Cluster Details.
  • Enterprise PKS v1.5.0 adds the following network profiles:
  • Cluster administrators can provision Windows worker-based Kubernetes clusters on vSphere with Flannel. Windows worker-based clusters in Enterprise PKS v1.5.0 do not support NSX-T integration. For more information, see Configuring Windows Worker-Based Clusters (Beta) and Deploying and Exposing Windows Workloads (Beta).
  • Operators can set the lifetime for the refresh and access tokens for Kubernetes clusters. You can configure the token lifetimes to meet your organization’s security and compliance needs. For information about configuring the access and refresh token for your Kubernetes clusters, see the UAA section in the Installing topic for your IaaS.
  • Operators can configure prefixes for OpenID Connect (OIDC) users and groups to avoid name conflicts with existing Kubernetes system users. Pivotal recommends adding prefixes to ensure OIDC users and groups do not gain unintended privileges on clusters. For information about configuring OIDC prefixes, see the Configure OpenID Connect section in the Installing topic for your IaaS.
  • Operators can configure an external SAML identity provider for user authentication and authorization. For information about configuring an external SAML identity provider, see the Configure SAML as an Identity Provider section in the Installing topic for your IaaS.
  • Operators can upgrade Kubernetes clusters separately from the Enterprise PKS tile. For information about upgrading Kubernetes clusters, see Upgrading Clusters.
  • Operators can configure the Telegraf agent to send master/etcd node metrics to a third-party monitoring service. For more information, see Monitoring Master/etcd Node VMs.
  • Operators can configure the default node drain behavior. You can use this feature to resolve hanging or failed cluster upgrades. For more information about configuring node drain behavior, see Worker Node Hangs Indefinitely in Troubleshooting and Configure Node Drain Behavior in Upgrade Preparation Checklist for Enterprise PKS v1.5.
  • App developers can create metric sinks for namespaces within a Kubernetes cluster. For more information, see Creating Sink Resources.
  • VMware’s Customer Experience Improvement Program (CEIP) and the Pivotal Telemetry Program (Telemetry) are now enabled in Enterprise PKS by default. This includes both new installations and upgrades. For information about configuring CEIP and Telemetry in the Enterprise PKS tile, see the CEIP and Telemetry section in the Installing topic for your IaaS.
  • Adds a beta release of VMware Enterprise PKS Management Console that provides a graphical interface for deploying and managing Enterprise PKS on vSphere. For more information, see Using the Enterprise PKS Management Console.

Product Snapshot

Element Details
Version v1.5.0
Release date August 20, 2019
Compatible Ops Manager versions * v2.5.12 and later or v2.6.6 and later
Xenial stemcell version v315.81
Windows stemcell version v2019.7
Kubernetes version v1.14.5
On-Demand Broker version v0.29.0
NSX-T versions v2.4.0.1, v2.4.1, v2.4.2 (see below), v2.5.0 (see below)
NCP version v2.5.0
Docker version v18.09.8
CFCR
Backup and Restore SDK version v1.17.0

* If you want to use Windows workers in Enterprise PKS v1.5, you must install Ops Manager v2.6.6 and later. Enterprise PKS does not support this feature on Ops Manager v2.5. For more information about Ops Manager v2.6.6 and later, see PCF Ops Manager v2.6 Release Notes.

VMware Enterprise PKS Management Console Product Snapshot

Note: The Management Console BETA provides an opinionated installation of Enterprise PKS. The supported versions may differ from or be more limited than what is generally supported by Enterprise PKS.

Element Details
Version v0.9. This feature is a beta component and is intended for evaluation and test purposes only.
Release date August 22, 2019
Installed Enterprise PKS version v1.5.0
Installed Ops Manager version v2.6.5
Installed Kubernetes version v1.14.5
Supported NSX-T versions v2.4.1, v2.4.2 (see below)
Installed Harbor Registry version v1.8.1

vSphere Version Requirements

For Enterprise PKS installations on vSphere or on vSphere with NSX-T Data Center, refer to the VMware Product Interoperability Matrices.

Upgrade Path

The supported upgrade paths to Enterprise PKS v1.5.0 are from Enterprise PKS v1.4.0 and later.

Exception: If you are running Enterprise PKS v1.4.0 with NSX-T v2.3.x, follow the steps below:

  1. Upgrade to PKS v1.4.1.
  2. Upgrade to NSX-T v2.4.1.
  3. Upgrade to PKS v1.5.0.

For more information, see Upgrading Enterprise PKS and Upgrading Enterprise PKS with NSX-T.

Breaking Changes

Enterprise PKS v1.5.0 has the following breaking changes:

Announcing Support for NSX-T v2.5.0 with a Known Issue and KB Article

Enterprise PKS v1.5 supports NSX-T v2.5. Before upgrading to NSX-T v2.5, note the following:

Announcing Support for NSX-T v2.4.2 with a Known Issue and Workaround

Enterprise PKS v1.5 supports NSX-T v2.4.2. However, there is a known issue with NSX-T v2.4.2 that can affect new and upgraded installations of Enterprise PKS v1.5 that use a NAT topology.

For NSX-T v2.4.2, the PKS Management Plane must be deployed on a Tier-1 distributed router (DR). If the PKS Management Plane is deployed on a Tier-1 service router (SR), the router needs to be converted. To convert an SR to a DR, refer to the East-West traffic between workloads behind different T1 is impacted, when NAT is configured on T0 (71363) KB article.

This issue is addressed in NSX-T v2.5 so that it does not matter if the Tier-1 Router is a DR or an SR.

New OIDC Prefixes Break Existing Cluster Role Bindings

In Enterprise PKS v1.5, operators can configure prefixes for OIDC usernames and groups. If you add OIDC prefixes you must manually change any existing role bindings that bind to a username or group. If you do not change your role bindings, developers cannot access Kubernetes clusters. For information about creating a role binding, see Managing Cluster Access and Permissions.

New API Group Name for Sink Resources

The apps.pivotal.io API group name for sink resources is no longer supported. The new API group name is pksapi.io.

When creating a sink resource, your sink resource YAML definition must start with apiVersion: pksapi.io/v1beta1. All existing sinks are migrated automatically.

For more information about defining and managing sink resources, see Creating Sink Resources.

Log Sink Changes

Enterprise PKS v1.5.0 adds the following log sink changes:

  • The ClusterSink log sink resource has been renamed to ClusterLogSink and the Sink log sink resource has been renamed to LogSink.

    • When you create a log sink resource with YAML, you must use one of the new names in your sink resource YAML definition. For example, specify kind: ClusterLogSink to define a cluster log sink. All existing sinks are migrated automatically.
    • When managing your log sink resources through kubectl, you must use the new log sink resource names. For example, if you want to delete a cluster log sink, run kubectl delete clusterlogsink instead of kubectl delete clustersink.
  • Log transport now requires a secure connection. When creating a ClusterLogSink or LogSink resource, you must include enable_tls: true in your sink resource YAML definition. All existing sinks are migrated automatically.

For more information about defining and managing sink resources, see Creating Sink Resources.

Deprecation of Sink Commands in the PKS CLI

The following Enterprise PKS Command Line Interface (PKS CLI) commands are deprecated and will be removed in a future release:

  • pks create-sink
  • pks sinks
  • pks delete-sink

You can use the following Kubernetes CLI commands instead:

  • kubectl apply -f MY-SINK.yml
  • kubectl get clusterlogsinks
  • kubectl delete clusterlogsink YOUR-SINK

For more information about defining and managing sink resources, see Creating Sink Resources.

Known Issues

Enterprise PKS v1.5.0 has the following known issues:

Duplicate IP address conflict can occur

When a network profile is used to provision a Kubernetes cluster and perform a DNS lookup of the ingress controller IP address, NCP allocates the IP address from the floating IP pool, but in NSX-T the IP address is not marked as allocated. As a result, NCP can re-allocate the IP same address for another purpose in which case a duplicate IP conflict will occur.

This known issue does not affect allocation of the IP address for the Kubernetes API server load balancer.

Passwords Not Supported for Ops Manager VM on vSphere

Starting in Ops Manager v2.6, you can SSH into the Ops Manager VM in a vSphere deployment only with a private SSH key. You cannot SSH into the Ops Manager VM with a password.

To avoid upgrade failure and errors when authenticating, add a public key to the Customize Template screen of the OVF template for the Ops Manager VM. Then, use the private key to SSH into the Ops Manager VM.

Warning: You cannot upgrade to Ops Manager v2.6 successfully without adding a public key. If you do not add a key, Ops Manager shuts down automatically because it cannot find a key and may enter a reboot loop.

For more information about adding a public key to the OVF template, see Deploy Ops Manager in Deploying Ops Manager on vSphere.

Azure Default Security Group Is Not Automatically Assigned to Cluster VMs

Symptom

You experience issues when configuring a load balancer for a multi-master Kubernetes cluster or creating a service of type LoadBalancer. Additionally, in the Azure portal, the VM > Networking page does not display any inbound and outbound traffic rules for your cluster VMs.

Explanation

As part of configuring the Enterprise PKS tile for Azure, you enter Default Security Group in the Kubernetes Cloud Provider pane. When you create a Kubernetes cluster, Enterprise PKS automatically assigns this security group to each VM in the cluster. However, on Azure the automatic assignment may not occur.

As a result, your inbound and outbound traffic rules defined in the security group are not applied to the cluster VMs.

Workaround

If you experience this issue, manually assign the default security group to each VM NIC in your cluster.

Cluster Creation Fails When First AZ Runs Out of Resources

Symptom

If the first availability zone (AZ) used by a plan with multiple AZs runs out of resources, cluster creation fails with an error like the following:

L Error: CPI error 'Bosh::Clouds::CloudError' with message 'No valid placement found for requested memory: 4096

Explanation

BOSH creates VMs for your Enterprise PKS deployment using a round-robin algorithm, creating the first VM in the first AZ that your plan uses. If the AZ runs out of resources, cluster creation fails because BOSH cannot create the cluster VM.

For example, if you have three AZs and you create two clusters with four worker VMs each, BOSH deploys VMs in the following AZs:

AZ1 AZ2 AZ3
Cluster 1 Worker VM 1 Worker VM 2 Worker VM 3
Worker VM 4
Cluster 2 Worker VM 1 Worker VM 2 Worker VM 3
Worker VM 4

In this scenario, AZ1 has twice as many VMs as AZ2 or AZ3.

Azure Worker Node Communication Fails after Upgrade

Symptom

Outbound communication from a worker node VM fails after upgrading Enterprise PKS.

Explanation

Enterprise PKS uses Azure Availability Sets to improve the uptime of workloads and worker nodes in the event of Azure platform failures. Worker node VMs are distributed evenly across Availability Sets.

Azure Standard SKU Load Balancers are recommended for the Kubernetes control plane and Kubernetes ingress and egress. This load balancer type provides an IP address for outbound communication using SNAT.

During an upgrade, when BOSH rebuilds a given worker instance in an Availability Set, Azure can time out while re-attaching the worker node network interface to the back-end pool of the Standard SKU Load Balancer.

For more information, see Outbound connections in Azure in the Azure documentation.

Workaround

You can manually re-attach the worker instance to the back-end pool of the Azure Standard SKU Load Balancer in your Azure console.

Error During Individual Cluster Upgrades

Symptom

While submitting a large number of cluster upgrade requests using the pks upgrade-cluster command, some of your Kubernetes clusters are marked as failed.

Explanation

BOSH upgrades Kubernetes clusters in parallel with a limit of up to four concurrent cluster upgrades by default. If you schedule more than four cluster upgrades, Enterprise PKS queues the upgrades and waits for BOSH to finish the last upgrade. When BOSH finishes the last upgrade, it starts working on the next upgrade request.

If you submit too many cluster upgrades to BOSH, an error may occur, where some of your clusters are marked as FAILED because BOSH can start the upgrade only within the specified timeout. The timeout is set to 168 hours by default. However, BOSH does not remove the task from the queue or stop working on the upgrade if it has been picked up.

Solution

If you expect that upgrading all of your Kubernetes clusters takes more than 168 hours, do not use a script that submits upgrade requests for all of your clusters at once. For information about upgrading Kubernetes clusters provisioned by Enterprise PKS, see Upgrading Clusters.

Kubectl CLI Commands Do Not Work after Changing an Existing Plan to a Different AZ

Symptom

After you update the AZ of an existing plan, kubectl CLI commands do not work for your clusters associated with the plan.

Explanation

This issue occurs in IaaS environments that do not support attaching a disk across multiple AZs.

When the plan of an existing cluster changes to a different AZ, BOSH migrates the cluster by creating VMs for the cluster in the new AZ and removing your cluster VMs from the original AZ.

On an IaaS that does not support attaching VM disks across AZs, the disks BOSH attaches to the new VMs do not have the original content.

Workaround

If you cannot run kubectl CLI commands after reconfiguring the AZ of an existing cluster, contact Support for assistance.

Internal Server Error When Saving Telemetry Settings

Symptom

When saving the Telemetry configuration pane in the Enterprise PKS tile, you receive an HTTP 500 Internal Server Error.

Explanation

When using Ops Manager v2.5, you may receive an HTTP 500 Internal Server Error if you attempt to save Telemetry preferences without configuring all of the required settings in the pane.

Solution

In your browser, return to the Telemetry pane. Configure all of the required settings and click Save.

One Plan ID Is Longer Than Other Plan IDs

Symptom

One of your plan IDs is one character longer than your other plan IDs.

Explanation

In Enterprise PKS, each plan has a unique plan ID. A plan ID is normally a UUID consisting of 32 alphanumeric characters and 4 hyphens. However, the Plan 4 ID consists of 33 alphanumeric characters and 4 hyphens.

Solution

You can safely configure and use Plan 4. The length of the Plan 4 ID does not affect the functionality of Plan 4 clusters.

If you require all plan IDs to have identical length, do not activate or use Plan 4.

Enterprise PKS Metric Sinks Fail to Use Secure Connections

Symptom

When you attempt to use MetricSink or ClusterMetricSink over a secure connection, the TLS handshake is rejected by Telegraf.

Explanation

This is due to missing CA certificates in the Telegraf container images included in the tile version. A patch is being worked on. Until this patch is published, users are not able to send metrics through secure connections.

Kubelet Fails to Start When Creating Clusters on Azure

Symptom

kubelet fails to start when creating new clusters on Azure.

Explanation

This issue occurs on Azure after enabling Availability Sets on a new Enterprise PKS v1.5 installation.

Enterprise PKS Management Console Known Issues

The following known issues are specific to the Enterprise PKS Management Console v0.9.0 appliance and user interface.

Enterprise PKS Management Console Notifications Persist

Symptom

In the Enterprise PKS view of Enterprise PKS Management Console, error notifications sometimes persist in memory on the Clusters and Nodes pages after you clear those notifications.

Explanation

After clicking the X button to clear a notification it is removed, but when you navigate back to those pages the notification might show again.

Workaround

Use shift+refresh to reload the page.

Cannot Delete Enterprise PKS Deployment from Management Console

Symptom

In the Enterprise PKS view of Enterprise PKS Management Console, you cannot use the Delete Enterprise PKS Deployment option even after you have removed all clusters.

Explanation

The option to delete the deployment is only activated in the management console a short period after the clusters are deleted.

Workaround

After removing clusters, wait for a few minutes before attempting to use the Delete Enterprise PKS Deployment option again.

Configuring Enterprise PKS Management Console Integration with VMware vRealize Log Insight

Symptom

Enterprise PKS Management Console appliance sends logs to VMware vRealize Log Insight over HTTP, not HTTPS.

Explanation

When you deploy the Enterprise PKS Management Console appliance from the OVA, if you require log forwarding to vRealize Log Insight, you must provide the port on the vRealize Log Insight server on which it listens for HTTP traffic. Do not provide the HTTPS port.

Workaround

Set the vRealize Log Insight port to the HTTP port. This is typically 9000.

Deploying Enterprise PKS to an Unprepared NSX-T Data Center Environment Results in Flannel Error

Symptom

When using the management console to deploy Enterprise PKS in NSX-T Data Center (Not prepared for PKS) mode, if an error occurs during the network configuration, the message Unable to set flannel environment is displayed in the deployment progress page.

Explanation

The network configuration has failed, but the error message is incorrect.

Workaround

To see the correct reason for the failure, see the server logs. For instructions about how to obtain the server logs, see Troubleshooting Enterprise PKS Management Console.

Using BOSH CLI from Operations Manager VM

Symptom

The BOSH CLI client bash command that you obtain from the Deployment Metadata view does not work when logged in to the Operations Manager VM.

Explanation

The BOSH CLI client bash command from the Deployment Metadata view is intended to be used from within the Enterprise PKS Management Console appliance.

Workaround

To use the BOSH CLI from within the Operations Manager VM, see Connect to Operations Manager.

From the Ops Manager VM, use the BOSH CLI client bash command from the Deployment Metadata page, with the following modifications:

  • Remove the clause BOSH_ALL_PROXY=xxx
  • Replace the BOSH_CA_CERT section with BOSH_CA_CERT=/var/tempest/workspaces/default/root_ca_certificate

Run pks Commands against the PKS API Server

Explanation

The PKS CLI is available in the Enterprise PKS Management Console appliance.

Workaround

To be able to run pks commands against the PKS API Server, you must first log to PKS using the following command syntax pks login -a fqdn_of_pks ….

To do this, you must ensure either of the following:

  • The FQDN configured for the PKS Server is resolvable by the DNS server configured for the Enterprise PKS Management Console appliance, or
  • An entry that maps the Floating IP assigned to the PKS Server to the FQDN exists on /etc/hosts in the appliance. For example: 192.168.160.102 api.pks.local.

Please send any feedback you have to pks-feedback@pivotal.io.