LATEST VERSION: v1.1 - RELEASE NOTES
Pivotal Container Service v1.1

Maintain Workload Uptime

Page last updated:

This topic describes how you can maintain workload uptime for Kubernetes clusters deployed with Pivotal Container Service (PKS).

To maintain workload uptime, configure the following settings in your deployment manifest:

  1. Configure workload replicas to handle traffic during rolling upgrades.
  2. Define an anti-affinity rule to evenly distribute workloads across the cluster.

To increase uptime, you can also refer to the documentation for the services that run on your clusters, and configure your workload based on the recommendations of the software vendor.

About Workload Upgrades

The PKS tile contains an errand that upgrades all Kubernetes clusters. Upgrades run on a single VM at a time. While one worker VM runs an upgrade, the workload on that VM goes down. The additional worker VMs continue to run replicas of your workload, maintaining the uptime of your workload.

Note: Ensure that your pods are bound to a ReplicaSet or Deployment. Naked pods are not rescheduled in the event of a node failure. For more information, see Configuration Best Practices in the Kubernetes documentation.

To prevent workload downtime during a cluster upgrade, Pivotal recommends running your workload on at least three worker VMs and using multiple replicas of your workloads spread across those VMs. You must edit your manifest to define the replica set and configure an anti-affinity rule to ensure that the replicas run on separate worker nodes.

Set Workload Replicas

Set the number of workload replicas to handle traffic during rolling upgrades. To replicate your workload on additional worker VMs, deploy the workload using a replica set.

Edit the spec.replicas value in your deployment manifest:

kind: Deployment
metadata:
  # ...
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: APP-NAME

See the following table for more information about this section of the manifest:

Key-Value Pair Description
spec:
replicas: 3
Set this value to at least 3 to have at least three instances of your workload running at any time.
app: APP-NAME
Use this app name when you define the anti-affinity rule later in the spec.

Define an Anti-Affinity Rule

To distribute your workload across multiple worker VMs, you must use anti-affinity rules. If you do not define an anti-affinity rule, the replicated pods can be assigned to the same worker node. See the Kubernetes documentation for more information about anti-affinity rules.

To define an anti-affinity rule, add the spec.template.spec.affinity section to your deployment manifest:

kind: Deployment
metadata:
  # ...
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: APP-NAME
    spec:
      containers:
      - name: MY-APP
        image: MY-IMAGE
        ports:
        - containerPort: 12345
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: "app"
                    operator: In
                    values:
                    - APP-NAME
              topologyKey: "kubernetes.io/hostname"

See the following table for more information:

Key-Value Pair Description
matchExpressions: 
- key: "app"
This value matches spec.template.metadata.labels.app.
values: 
- APP-NAME
This value matches the APP-NAME you defined earlier in the spec.

Multi-AZ Worker

Kubernetes evenly spreads pods in a replication controller over multiple Availability Zones (AZs). For more granular control over scheduling pods, add an Anti-Affinity Rule to the deployment spec by replacing "kubernetes.io/hostname" with "failure-domain.beta.kubernetes.io/zone".
For more information on scheduling pods, see Advanced Scheduling in Kubernetes on the Kubernetes Blog.

Persistent Volumes

Persistent volumes cannot be attached across AZs. Therefore, when persistent volumes are created, the PersistentVolumeLabel admission controller automatically adds AZ labels to them. The scheduler then ensures that pods that claim a given volume are only placed into the same AZ as that volume.

If an AZ goes down, the persistent volume along with its data also goes down and cannot be automatically re-attached. To preserve your persistent volume data in the event of a fallen AZ, your persistent workload needs to have a failover mechanism in place.

For example, to ensure the uptime of your persistent volumes during a cluster upgrade, Pivotal recommends that you have at least two nodes per AZ. By configuring your workload as suggested, Kubernetes reschedules pods in the other node of the same AZ while BOSH is performing the upgrade.


Please send any feedback you have to pks-feedback@pivotal.io.

Create a pull request or raise an issue on the source for this page in GitHub