Upgrade Considerations for Selecting File Storage in Pivotal Cloud Foundry

Page last updated:

This topic describes critical factors to consider when evaluating the type of file storage to use in your Pivotal Cloud Foundry (PCF) deployment. The Elastic Runtime blobstore relies on the file storage system to read and write resources, app packages, and droplets.

During an upgrade of PCF, file storage with insufficient IOPS numbers can negatively impact the performance and stability of your PCF deployment.

If disk processing time takes longer than the evacuation timeout for Diego cells, then Diego cells and app instances may take too long to start up, resulting in a cascading failure.

However, the minimum required IOPS depends upon a number of deployment-specific factors and configuration choices. Use this topic as a guide when deciding on the file storage configuration for your deployment.

To see an example of system performance and IOPS load during an upgrade, refer to Pivotal Web Services Performance During Upgrade.

Selecting Internal or External File Storage

When you deploy PCF, you can select internal file storage or external file storage, either network-accessible or IaaS-provided, as an option in the Elastic Runtime tile.

Selecting internal storage causes PCF to deploy a dedicated virtual machine (VM) that uses either NFS or WebDAV for file storage. Selecting external storage allows you to configure file storage provided in network-accessible location or by an IaaS, such as Amazon S3, Google Cloud Storage, or Azure Storage.

Whenever possible, Pivotal recommends using external file storage.

Calculating Potential Disk Load Requirements

As a best-effort calculation, estimate the total number of bits needed to move during a system upgrade to determine how IOPS-performant your file storage needs to be.

Number of Diego Cells

As a first calculation, determine the number of Diego cells that your deployment currently uses.

To view the number of Diego cell instances currently running in your deployment, see the Resource Config section of your Elastic Runtime tile.

If you expect to scale up the number of instances, use the anticipated scaled number.

Note: If your deployment uses more than 20 Diego cells, you should avoid using internal file storage. Instead, you should always select external or IaaS-provided file storage.

Maximum In-Flight Load and Container Starts for Diego Cells

Operators can limit the number of containers and Diego cell instances that Diego starts concurrently. If operators impose no limits, your file storage may experience exceptionally heavy load during an upgrade.

To prevent overload, Cloud Foundry provides two major throttle configurations:

  • The maximum number of starting containers that Diego can start in Cloud Foundry: This is a deployment-wide limit. The default value and ability to override this configuration depends on the version of Cloud Foundry deployed. For information about how to configure this setting, see the Setting a Maximum Number of Started Containers topic.

  • The max_in_flight setting for the Diego cell job configured in the BOSH manifest: This configuration, expressed as a percentage or an integer, sets the maximum number of job instances that can be upgraded simultaneously. For example, if your deployment is running 10 Diego cell job instances and the configured max_in_flight value is 20%, then only 2 Diego cell job instances can start up at a single time.

    To retrieve or override the existing max_in_flight value in Ops Manager Director, use the Ops Manager API. See the Ops Manager API documentation provided with your Ops Manager installation at https://YOUR-OPSMAN-FQDN/docs/.

The values of the above throttle configurations depend on the version of PCF that you have deployed and whether you have overridden the default values.

Refer to the following table for existing defaults and, if necessary, determine the override values in your deployment.

PCF Version Starting Container Count Maximum Starting Container Count Overridable? Maximum In Flight Diego Cell Instances Maximum In Flight Diego Cell Instances Overridable?
PCF 1.7.43 and earlier No limit set No 1 instance No
PCF 1.7.44 to 1.7.49 200 No 1 instance No
PCF 1.7.50 + 200 No 1 instance No
PCF 1.8.0 to 1.8.29 No limit set No 10% of total instances No
PCF 1.8.30 + 200 Yes 10% of total instances No
PCF 1.9.0 to 1.9.7 No limit set No 4% of total instances Yes
PCF 1.9.8 + 200 Yes 4% of total instances Yes
PCF 1.10.0 and later 200 Yes 4% of total instances Yes

Calculating Upgrade Load Based on Number of App Instances and Droplet Size

Using the above numbers, you can determine a rough estimate of the expected upgrade load by multiplying the total number of expected app instances for all cells with the size of the instance droplets.

For example, if your deployment starts 10 cells that each host 20 app instances, and each app instance droplet is an average of 100 MB in size, then you potentially have 20 GB of data hitting the disk at the same time. Depending on the IOPS capacity of your disk, this 20 GB of data will take a set amount of time to reassemble on a new disk.

Calculate the amount of time needed to process your potential upgrade load, and verify that the number falls under the evacuation timeout (default is 10 minutes) for Diego cells.

If the calculated processing time is longer than the evacuation timeout, you should upgrade your file storage to use disk with higher IOPS capacity.

For more information about how Diego cells are upgraded, see the Managing Diego Cell Limits During an Upgrade topic.

Create a pull request or raise an issue on the source for this page in GitHub