Pivotal Web Services Performance During Upgrade

Page last updated:

This topic provides sample performance measurements of a Cloud Foundry installation undergoing the workload associated with an upgrade.

To obtain these measurements, Pivotal repaved its production Pivotal Web Services (PWS) deployment. The repave process simulates system load that would be incurred when performing a rolling upgrade of Diego cells.

Use the measurements and configuration values published in this document as guidance when ensuring you have adequate file storage hardware prior to a platform upgrade.

For more information about the impact of upgrade on file storage performance, see Upgrade Considerations for Selecting File Storage in Pivotal Cloud Foundry.

Platform Configuration

The following table details the starting parameters and configuration of PWS.

Configuration Value How to Locate
IaaS Amazon Web Services Refer to your Ops Manager Director configuration or BOSH deployment manifest.
File Storage AWS EBS (External with some elastic capacity) Refer to your Elastic Runtime configuration or BOSH deployment manifest.
Version of CF v252 Refer to your Ops Manager Director and Elastic Runtime configuration or BOSH deployment manifest.
Number of Diego Cells 218 To view the number of Diego cell instances currently running in your deployment, see the Resource Config section of your Elastic Runtime tile or consult your Diego deployment manifest.
Maximum Number of Started Containers 250 See PCF or Cloud Foundry documentation for configuration information.
max_in_flight Configuration for Diego Cells 6 To retrieve the existing max_in_flight value for the Diego Cell job in Ops Manager Director, use the Ops Manager API. See the Ops Manager API documentation. If you are running open source CF, consult your BOSH deployment manifest.
Number of Availability Zones (AZ) 2 Consult your Elastic Runtime or BOSH deployment AZ configuration.
Number of App Instances 16231 datadog.nozzle.bbs.LRPsRunning
Number of Application Security Groups (ASGs) 43 As admin user, run the cf security-groups command. For more information, see Understanding Application Security Groups.

System Performance Measurements During Cell Repave

This table presents performance measurements taken during the Diego cell repave.

Note: These measurements indicate the peak cumulative values of the entire system (250 Diego cells, ~15,000 application instances, and 2 AZs.)

Use these measurements as a baseline for expected system load during Diego cell upgrade.

Measurement Value Metric Used
Cell CPU Consumption 36% bosh.healthmonitor.system.cpu.user
Cell Memory Consumption ~50% bosh.healthmonitor.system.mem.percent
Cell I/O Consumption (Read) During Normal Operations 43 Read I/O Operations per second aws.ebs.volume_read_ops
Cell I/O Consumption (Read) During Upgrade 1,943 Read I/O Operations per second aws.ebs.volume_read_ops
Cell IO Consumption (Write) During Normal Operations 2,166 Write I/O Operations per second aws.ebs.volume_write_ops
Cell IO Consumption (Write) During Upgrade 21,000 Write I/O Operations per second aws.ebs.volume_write_ops
Cell Network Consumption (Network Out) During Normal Operations ~1.25 GB per minute aws.ec2.network_out
Cell Network Consumption (Network Out) During Upgrade ~1.25 GB per minute (no significant change) aws.ec2.network_out
Cell Network Consumption (Network In) During Normal Operations 2.11 GB per minute aws.ec2.network_in
Cell Network Consumption (Network In) During Upgrade 16.75GB per minute aws.ec2.network_in

Sample Performance Graphs

These DataDog graphs represent a timeline visualization of read and write operations during the repave event.

Read I/O Operations

The read I/O operations sample was taken over 115 VMs and represent the number of read operations over 300 seconds for a single Diego cell.

Pws read ops upgrade

Write I/O Operations

The write I/O operations sample was taken over 115 VMs and represent the number of read I/O operations over 300 seconds for a single Diego cell.

Pws write ops upgrade

Summary

During the repave process, 250 Diego cells were updated. The repave process took 6 hours overall or about 3 hours for each Availability Zone.

Create a pull request or raise an issue on the source for this page in GitHub