Upgrade Considerations for Selecting File Storage in Pivotal Cloud Foundry
Page last updated:
This topic describes critical factors to consider when evaluating the type of file storage to use in your Pivotal Cloud Foundry (PCF) deployment. The Elastic Runtime blobstore relies on the file storage system to read and write resources, app packages, and droplets.
During an upgrade of PCF, file storage with insufficient IOPS numbers can negatively impact the performance and stability of your PCF deployment.
If disk processing time takes longer than the evacuation timeout for Diego cells, then Diego cells and app instances may take too long to start up, resulting in a cascading failure.
However, the minimum required IOPS depends upon a number of deployment-specific factors and configuration choices. Use this topic as a guide when deciding on the file storage configuration for your deployment.
To see an example of system performance and IOPS load during an upgrade, refer to Pivotal Web Services Performance During Upgrade.
When you deploy PCF, you can select internal file storage or external file storage, either network-accessible or IaaS-provided, as an option in the Elastic Runtime tile.
Selecting internal storage causes PCF to deploy a dedicated virtual machine (VM) that uses either NFS or WebDAV for file storage. Selecting external storage allows you to configure file storage provided in network-accessible location or by an IaaS, such as Amazon S3, Google Cloud Storage, or Azure Storage.
Whenever possible, Pivotal recommends using external file storage.
As a best-effort calculation, estimate the total number of bits needed to move during a system upgrade to determine how IOPS-performant your file storage needs to be.
As a first calculation, determine the number of Diego cells that your deployment currently uses.
To view the number of Diego cell instances currently running in your deployment, see the Resource Config section of your Elastic Runtime tile.
If you expect to scale up the number of instances, use the anticipated scaled number.
Note: If your deployment uses more than 20 Diego cells, you should avoid using internal file storage. Instead, you should always select external or IaaS-provided file storage.
Operators can limit the number of containers and Diego cell instances that Diego starts concurrently. If operators impose no limits, your file storage may experience exceptionally heavy load during an upgrade.
To prevent overload, Cloud Foundry provides two major throttle configurations:
The maximum number of starting containers that Diego can start in Cloud Foundry: This is a deployment-wide limit. The default value and ability to override this configuration depends on the version of Cloud Foundry deployed. For information about how to configure this setting, see the Setting a Maximum Number of Started Containers topic.
max_in_flightsetting for the Diego cell job configured in the BOSH manifest: This configuration, expressed as a percentage or an integer, sets the maximum number of job instances that can be upgraded simultaneously. For example, if your deployment is running 10 Diego cell job instances and the configured
20%, then only 2 Diego cell job instances can start up at a single time.
To retrieve or override the existing
max_in_flightvalue in Ops Manager Director, use the Ops Manager API. See the Ops Manager API documentation provided with your Ops Manager installation at
The values of the above throttle configurations depend on the version of PCF that you have deployed and whether you have overridden the default values.
Refer to the following table for existing defaults and, if necessary, determine the override values in your deployment.
|PCF Version||Starting Container Count Maximum||Starting Container Count Overridable?||Maximum In Flight Diego Cell Instances||Maximum In Flight Diego Cell Instances Overridable?|
|PCF 1.7.43 and earlier||No limit set||No||1 instance||No|
|PCF 1.7.44 to 1.7.49||200||No||1 instance||No|
|PCF 1.7.50 +||200||No||1 instance||No|
|PCF 1.8.0 to 1.8.29||No limit set||No||10% of total instances||No|
|PCF 1.8.30 +||200||Yes||10% of total instances||No|
|PCF 1.9.0 to 1.9.7||No limit set||No||4% of total instances||Yes|
|PCF 1.9.8 +||200||Yes||4% of total instances||Yes|
|PCF 1.10.0 and later||200||Yes||4% of total instances||Yes|
|PCF 1.12.0 and later||200||Yes||4% of total instances||Yes|
Using the above numbers, you can determine a rough estimate of the expected upgrade load by multiplying the total number of expected app instances for all cells with the size of the instance droplets.
For example, if your deployment starts 10 cells that each host 20 app instances, and each app instance droplet is an average of 100 MB in size, then you potentially have 20 GB of data hitting the disk at the same time. Depending on the IOPS capacity of your disk, this 20 GB of data will take a set amount of time to reassemble on a new disk.
Calculate the amount of time needed to process your potential upgrade load, and verify that the number falls under the evacuation timeout (default is 10 minutes) for Diego cells.
If the calculated processing time is longer than the evacuation timeout, you should upgrade your file storage to use disk with higher IOPS capacity.
For more information about how Diego cells are upgraded, see the Managing Diego Cell Limits During an Upgrade topic.