Upgrade Considerations for Selecting File Storage in Pivotal Cloud Foundry

Page last updated:

This topic describes critical factors to consider when evaluating the type of file storage to use in your Pivotal Cloud Foundry (PCF) deployment. The Elastic Runtime blobstore relies on the file storage system to read and write resources, app packages, and droplets.

During an upgrade of PCF, file storage with insufficient IOPS numbers can negatively impact the performance and stability of your PCF deployment.

If disk processing time takes longer than the evacuation timeout for Diego cells, then Diego cells and app instances may take too long to start up, resulting in a cascading failure.

However, the minimum required IOPS depends upon a number of deployment-specific factors and configuration choices. Use this topic as a guide when deciding on the file storage configuration for your deployment.

To see an example of system performance and IOPS load during an upgrade, refer to Pivotal Web Services Performance During Upgrade.

Selecting Internal or External File Storage

When you deploy PCF, you can select internal file storage or external file storage, either network-accessible or IaaS-provided, as an option in the Elastic Runtime tile.

Selecting internal storage causes PCF to deploy a dedicated virtual machine (VM) that uses either NFS or WebDAV for file storage. Selecting external storage allows you to configure file storage provided in network-accessible location or by an IaaS, such as Amazon S3, Google Cloud Storage, or Azure Storage.

Whenever possible, Pivotal recommends using external file storage.

Calculating Potential Disk Load Requirements

As a best-effort calculation, estimate the total number of bits needed to move during a system upgrade to determine how IOPS-performant your file storage needs to be.

Number of Diego Cells

As a first calculation, determine the number of Diego cells that your deployment currently uses.

To view the number of Diego cell instances currently running in your deployment, see the Resource Config section of your Elastic Runtime tile.

If you expect to scale up the number of instances, use the anticipated scaled number.

Note: If your deployment uses more than 20 Diego cells, you should avoid using internal file storage. Instead, you should always select external or IaaS-provided file storage.

Maximum In-Flight Load and Container Starts for Diego Cells

Operators can limit the number of containers and Diego cell instances that Diego starts concurrently. If operators impose no limits, your file storage may experience exceptionally heavy load during an upgrade.

To prevent overload, Cloud Foundry provides two major throttle configurations:

  • The maximum number of starting containers that Diego can start in Cloud Foundry: This is a deployment-wide limit. The default value and ability to override this configuration depends on the version of Cloud Foundry deployed. For information about how to configure this setting, see the Setting a Maximum Number of Starting Containers topic.

  • The max_in_flight setting for the Diego cell job configured in the BOSH manifest: This configuration, expressed as a percentage or an integer, sets the maximum number of job instances that can be upgraded simultaneously. For example, if your deployment is running 10 Diego cell job instances and the configured max_in_flight value is 20%, then only 2 Diego cell job instances can start up at a single time.

    To retrieve or override the existing max_in_flight value in Ops Manager Director, use the Ops Manager API. See the Ops Manager API documentation.

The values of the above throttle configurations depend on the version of PCF that you have deployed and whether you have overridden the default values.

Refer to the following table for existing defaults and, if necessary, determine the override values in your deployment.

PCF Version Starting Container Count Maximum Starting Container Count Overridable? Maximum In Flight Diego Cell Instances Maximum In Flight Diego Cell Instances Overridable?
PCF 1.7.43 and earlier No limit set No 1 instance No
PCF 1.7.44 to 1.7.49 200 No 1 instance No
PCF 1.7.50 + 200 No 1 instance No
PCF 1.8.0 to 1.8.29 No limit set No 10% of total instances No
PCF 1.8.30 + 200 Yes 10% of total instances No
PCF 1.9.0 to 1.9.7 No limit set No 4% of total instances Yes
PCF 1.9.8 + 200 Yes 4% of total instances Yes
PCF 1.10.0 and later 200 Yes 4% of total instances Yes
PCF 1.12.0 and later 200 Yes 4% of total instances Yes
Create a pull request or raise an issue on the source for this page in GitHub