Upgrade Considerations for Selecting File Storage in Pivotal Cloud Foundry
Page last updated:
This topic describes critical factors to consider when evaluating the type of file storage to use in your Pivotal Cloud Foundry (PCF) deployment. The Elastic Runtime blobstore relies on the file storage system to read and write resources, app packages and droplets.
During an upgrade of PCF, file storage with insufficient IOPS numbers can negatively impact the performance and stability of your PCF deployment. However, the minimum required IOPS depends upon a number of deployment-specific factors and configuration choices.
Use this topic as a guide when deciding on the file storage configuration for your deployment.
When you deploy PCF, you can select internal file storage or external file storage (network-accessible or IaaS-provided) file storage as an option in the Elastic Runtime tile.
Selecting internal storage causes PCF to deploy a dedicated VM that uses either NFS or WebDAV for file storage. Selecting external storage allows you to configure file storage provided in network-accessible location or by an IaaS, such as Amazon S3, Google Cloud Storage or Azure Storage.
Whenever possible, Pivotal recommends using external file storage.
The best effort calculation to figure out how performant (IOPS) your file storage needs is to determine a rough estimate of how many total bits need to move during a system upgrade.
As a first calculation, determine the number of Diego cells that your deployment is currently using.
To view the number of Diego cell instances currently running in your deployment, see the Resource Config section of your Elastic Runtime tile.
If you expect to scale up the number of instances, use the anticipated scaled number.
Note: If your deployment uses more than 20 Diego cells, you should avoid the use of internal file storage and always select external or IaaS-provided file storage.
Operators can limit the number of containers and Diego cell instances that Diego starts concurrently. If no limits are imposed, then your file storage may undergo exceptionally heavy load during an upgrade.
To prevent overload, Cloud Foundry provides two major throttle configurations:
The maximum number of starting containers that Diego can start in Cloud Foundry. This is a deployment-wide limit. The default value and ability to override this configuration depends on the version of Cloud Foundry deployed. For information on how to configure this setting, see the Setting a Maximum Number of Started Containers topic.
max_in_flightsetting for the Diego cell job configured in the BOSH manifest. This configuration, expressed as a percentage or an integer, sets the maximum number of job instances that can be upgraded simultaneously. For example, if your deployment is running 10 Diego cell job instances and the configured
20%, then only 2 Diego cell job instances can start up at a single time.
The value of the above configurations depend on the version of PCF that you have have deployed and whether you have overridden the default value.
Refer to the following table for existing defaults and if necessary, determine the override value in your deployment.
|PCF Version||Starting Container Count Maximum||Starting Container Count Overridable?||Maximum In Flight Diego Cell Instances||Maximum In Flight Diego Cell Instances Overridable?|
|PCF 1.7.43 and earlier||No limit set||No||1 instance||No|
|PCF 1.7.44 to 1.7.49||200||No||1 instance||No|
|PCF 1.7.50 +||200||No||1 instance||No|
|PCF 1.8.0 to 1.8.29||No limit set||No||10% of total instances||No|
|PCF 1.8.30 +||200||Yes||10% of total instances||No|
|PCF 1.9.0 to 1.9.7||No limit set||No||10% of total instances||Yes|
|PCF 1.9.8 +||200||Yes||10% of total instances||Yes|
|PCF 1.10.0 and later||200||Yes||10% of total instances||Yes|
Using the above numbers, you can determine a rough estimate of the expected upgrade load by multiplying the total number of expected app instances for all cells with the size of the instance droplets.
For example, if your deployment starts up 10 cells that each host 20 app instances and each app instance droplet is an average of 100 MB in size, then you potentially have 20 GB of data hitting the disk at the same time.
Depending on the IOPS capacity of your disk, this 20 GB of data will take a set amount of time to reassemble on a new disk. If this disk processing time takes too longer than the evacuation timeout for Diego cells, then Diego cells and app instances may take too long to start up, resulting in a cascading failure.
For more information on how Diego cells are upgraded, see Managing Diego Cell Limits During an Upgrade.