Upgrade Preparation Checklist for PCF v2.4

This topic serves as a checklist for preparing to upgrade Pivotal Cloud Foundry (PCF) from v2.3 to v2.4.

This topic contains important preparation steps that you must follow before beginning your upgrade. Failure to follow these instructions may jeopardize your existing deployment data and cause the upgrade to fail.

After completing the steps in this topic, you can continue to Upgrading Pivotal Cloud Foundry.

Warning: Pivotal does not recommend that you skip minor versions when upgrading PCF. Skipping minor versions when upgrading PCF may result in breaking changes. To avoid additional breaking changes, upgrade PCF to the minor version that directly follows your current version of PCF.

Back Up Your PCF Deployment

Pivotal recommends backing up your PCF deployment before upgrading, to restore in the case of failure. To do this, follow the instructions in the Backing Up Pivotal Cloud Foundry with BBR topic.

Find Your Decryption Passphrase for Ops Manager

To complete the Ops Manager upgrade, you must have your Ops Manager decryption passphrase. You defined this decryption passphrase during the initial installation of Ops Manager.

Review Changes in PCF v2.4

Review each of the following links to understand the changes in the new release, such as new features, known issues, and breaking changes.

Migrate Internal Databases to Percona MySQL

PAS v2.4 uses a Percona server. If your PAS v2.3 uses the MariaDB infrastructure, you must migrate your databases to Percona before you can upgrade to PAS v2.4.

To migrate your databases to Percona, see Migrating to Internal Percona MySQL.

Update Tiles and Add-Ons

The following section describes changes you must make to your product tiles and add-ons before upgrading PCF.

Review Tile Compatibility

Before you upgrade to PCF v2.4, check whether the service tiles that you currently have deployed on PCF v2.3 are compatible with PCF v2.4.

To check PCF versions supported by a service tile, either from Pivotal or a Pivotal partner:

  • Navigate to the tile’s download page on Pivotal Network.
  • Select the tile version in the Releases dropdown.
  • See the Depends On section under Release Details. For more information, refer to the tile’s release notes.

If the currently-deployed version of a tile is not compatible with PCF v2.4, you must upgrade the tile to a compatible version before you upgrade PCF. You do not need to upgrade tiles that are compatible with both PCF v2.3 and v2.4.

Some partner service tiles may be incompatible with PCF v2.4. Pivotal works with partners to ensure their tiles are updated to work with the latest versions of PCF. For more information about which partner service release compatibility, review the Depends On section of the partners tile download page, the partners services release documentation in Pivotal Documentation, or contact the partner organization that produces the service tile.

The Product Compatibility Matrix provides an overview of which PCF versions support which versions of the most popular service tiles from Pivotal.

Environment Details

Pivotal provides the empty table below as a model to print out or adapt for recording and tracking the tile versions that you have deployed in all of your environments.

Sandbox Non-Prod Prod Other…
Pivotal Cloud Foundry Ops Manager
Pivotal Application Service (PAS)
Pivotal Cloud Foundry Services MySQL v2
Redis
RabbitMQ
Single Sign On (SSO)
Spring Cloud Services
Concourse
Pivotal Cloud Foundry Partner Services New Relic

Upgrade Services Tiles

Upgrade all service tiles to versions that are compatible with PCF v2.4. Service tiles are add-on products you install alongside your runtime. For example, MySQL for PCF, PCF Healthwatch, and RabbitMQ are service tiles.

Do not upgrade runtime tiles, such as PAS, PAS for Windows (PASW), or Pivotal Container Service (PKS), at this time.

Review the Compatibility Matrix and tile documentation to check version compatibility.

Upgrade MySQL for PCF from v1 to v2

MySQL for PCF v1 is not compatible with PCF v2.4.

If you are running MySQL for PCF v1.10 or earlier, you must do the following before you upgrade to PCF v2.4:

  1. Install MySQL for PCF v2. For more information, see Installing and Configuring MySQL for PCF.
  2. Migrate your service instances from v1 to v2. For more information, see Migrating Data in MySQL for PCF.
  3. Delete the MySQL for PCF v1 tile.

(Optional) Install PAS v2.3.2+, to Avoid Autoscaler Downtime

App Autoscaler consumes a new log-cache API. If you are running PAS v2.3.0 or v2.3.1, upgrading directly to v2.4 renders the App Autoscaler intermittently unusable until the upgrade completes.

Updating to PAS v2.3.2 or later before upgrading to PAS v2.4 prevents App Autoscaler downtime.

(Optional) Install and Configure PAS v2.3.14+, to Avoid Misrouting of TLS or mTLS Apps

Stale routes can persist in PAS v2.3 deployments that are configured to use TLS or mTLS in the Router application identity verification setting of Application Containers.

To fix this issue, do the following:

  1. Log in to Ops Manager.
  2. Update the PAS tile to PAS v2.3.14 or later.
  3. Select the PAS tile.
  4. Select the Application Containers tab.
  5. View the settings in Router application identity verification.
  6. Verify that either Router uses TLS to verify application identity or Router and applications use mutual TLS to verify each other’s identity is selected.
  7. Select the Prune Routes on TTL Expiry for TLS Backends checkbox.
  8. Click Apply Changes.

When you upgrade PAS, make sure you upgrade to PAS v2.4.10 or later to preserve this configuration.

Install BOSH CLI v5.3.1 or Later

Install BOSH CLI v5.3.1 to avoid IP conflict errors that cause BOSH healthcheck tasks to fail to acquire locks. For more information, see BOSH health check tasks fails to acquire lock in the Pivotal Knowledge Database.

Configure BOSH Director

With each release of a new PCF version, BOSH Director may require specific updates before upgrading to the new version. See the following for what action to take before upgrading to PCF v2.4:

  1. Check the required machine specifications for Ops Manager v2.4. These specifications are specific to your IaaS. If these specifications do not match your existing Ops Manager, modify the values of your Ops Manager VM instance. For example, if the boot disk of your existing Ops Manager is 50 GB and the new Ops Manager requires 100 GB, then increase the size of your Ops Manager boot disk to 100 GB.

Configure PAS

With each release of a new PCF version, PAS may require specific updates before upgrading to the new version. See the following sections for what action to take before upgrading to PCF v2.4:

(Optional) Disable Unused Errands

To save upgrade time, you can disable unused PAS post-deploy errands. See the Post-Deploy Errands section of the Errands topic for details. Only disable these errands if your environment does not need them.

In some cases, if you have previously disabled lifecycle errands for any installed product to reduce deployment time, you may want to re-enable these errands before upgrading. For more information, see the Adding and Deleting Products topic.

Confirm Cipher Suites

Ensure that the TLS Cipher Suites for Router field contains a cipher suite supported by both Gorouter and CAPI. For details, see Removal of Default Ciphers Causes Downtime when Upgrading from 2.2 to 2.3 and from 2.3 to 2.4.

Configure Diego Cell Garbage collection

In the PAS tile Application Container pane, if Docker Images Disk-Cleanup Scheduling on Cell VMs is set to Clean up disk-space once threshold is reached and the value of Threshold of Disk-Used (MB) below has been changed from the default of 10240, then operators need to pick a new threshold. If the scheduling is set to Never clean up… or Routinely clean up…*, or the threshold value is set to the default, then the no action is necessary, and any threshold will migrate to a sensible value. For more information see Options for Disk Cleanup.

Configure Gorouter with TLS

Before upgrading to PAS v2.4, you must secure the Gorouter with TLS or mutual TLS for PAS and Isolation Segment tiles.

The following sections describe how to enable routing with TLS or mutual TLS, scale the Diego cell VM CPU and RAM, and scale the Gorouter. If you do not have enough RAM on your Diego cell VM after enabling TLS routing, you are unable to stage tasks and app instances. App instances may also stop running. If you do not have enough memory and CPU on the Gorouters, latency may increase and throughput may decrease.

Note: Gorouter with TLS or mutual TLS is not supported in the PAS for Windows tile.

Step 1: Enable TLS or Mutual TLS Routing

To enable TLS or mutual TLS routing, do the following:

  1. From the Ops Manager Installation Dashboard, go to the PAS tile.

  2. Go to the Application Containers pane.

  3. Under Router application identity verification, select either of the following options:

    • Router uses TLS to verify application identity
    • Router and applications use mutual TLS to verify each other’s identity
  4. Click Save.

Step 2: Determine Number of App Instances

Before you scale your Diego cell VM to handle TLS routing, you must determine the number of app instances running on your deployment.

See the following methods for how you can count your app instances. Choose the method that corresponds to your use case.

  • Deployments without Isolation Segments

    1. Access your platform metrics with your configured monitoring tool or with cf CLI Firehose nozzle plugin. For more information about the CLI Firehose nozzle plugin, see Installing the Loggregator Firehose Plugin for cf CLI. For more information about configuring a monitoring system, see Selecting and Configuring a Monitoring System.
    2. Find the LRPsDesired metric of the bbs job on the Diego Database VM. See the following example:
      origin:"bbs" eventType:ValueMetric
      timestamp:1541543212057232344 deployment:"cf" job:"diego_database"
      index:"b1f0c6d8-274b-4cfc-bfa1-a6feeb351802" ip:"10.0.4.10"
      tags:< key:"instance_id" value:"b1f0c6d8-274b-4cfc-bfa1-a6feeb351802" >
      tags:< key:"source_id" value:"bbs" > valueMetric:< name:"LRPsDesired"
      value:200 unit:"Metric" >
    3. Record the value of the LRPsDesired for each instance of the bbs job. You need the value for the procedure in the next section.
  • Deployments with Isolation Segments

    1. Access your platform metrics with your configured monitoring tool or with cf CLI Firehose nozzle plugin. For more information about the CLI Firehose nozzle plugin, see Installing the Loggregator Firehose Plugin for cf CLI. For more information about configuring a monitoring system, see Selecting and Configuring a Monitoring System.
    2. Find all ContainerCount metrics of the rep job of each Diego cell VM. See the following example:
      origin:"rep" eventType:ValueMetric
      timestamp:1541543910092859448 deployment:"cf" job:"diego_cell"
      index:"8007afda-3bff-4856-857f-a47a43cbf994" ip:"10.0.4.18"
      tags:< key:"instance_id" value:"8007afda-3bff-4856-857f-a47a43cbf994" >
      tags:< key:"source_id" value:"rep" > valueMetric:< name:
      "ContainerCount" value:200 unit:"Metric" >
    3. For each ContainerCount metric, record the value of the ContainerCount and the ip of the job.

Step 3: Scale Diego Cell VM

To support TLS or mutual TLS routing, you must have enough CPU and RAM for your Diego cell VM. TLS routing requires an additional 32 MB of RAM on your Diego cell per app instance.

To calculate and configure the amount of RAM you need for your Diego cell, choose one of the following methods for your use case:

  • Deployments without Isolation Segments

    For your PAS tile, do the following:

    1. Go to the Resource Config pane.
    2. In the Diego Cell row, see your current VM Type with the amount of RAM you currently have.
    3. Multiply the value you recorded in the previous section by 32. Add your solution to the amount of RAM you currently have.
    4. Select your new VM Type based on the amount of RAM you need.

      Note: Alternatively, you can scale your Diego Cell VM by increasing the instance count to support the amount of RAM you need.

    5. Click Save.
  • Deployments with Isolation Segments

    For each PAS and Isolation Segment tile in your foundation, do the following:

    1. Go to the Status tab and see the IP of your Diego cell. To determine the value that corresponds to this tile, match the IP to the ip metric you recorded in the previous section.
    2. Go to the Resource Config pane of the the tile.
    3. In the Diego Cell row, see your current VM Type with the amount of RAM you currently have.
    4. Multiply the value for this tile by 32. Add your solution to the amount of RAM you currently have.
    5. Select your new VM Type based on the amount of RAM you need.

      Note: Alternatively, you can scale your Diego Cell VM by increasing the instance count to support the amount of RAM you need.

    6. Click Save.

Step 4: Scale Gorouters

You may see an increase of memory and CPU usage for your Gorouters after enabling TLS routing. If the memory and CPU usage of the Gorouters in your environment are close to the size limit, scale your Gorouters before enabling TLS routing.

For more information about scaling Gorouters, see Router Performance Scaling Indicator in Key Capacity Scaling Indicators.

Check OS Compatibility of BOSH-Managed Add-Ons and Tiles

Before upgrading to PCF v2.4, operators who have deployed any PCF add-ons such as IPsec for PCF, ClamAV for PCF, or File Integrity Monitoring for PCF and who have deployed or are planning to deploy Pivotal Application Service for Windows must modify the add-on manifest to specify a compatible OS stemcell.

For example, File Integrity Monitor for PCF (FIM) is not supported on Windows. Therefore, the manifest must use an include directive to specify the target OS stemcell of ubuntu-trusty and ubuntu-xenial.

Note: To upgrade to a Xenial stemcell, see the documentation for each add-on and follow the instructions.

To update an add-on manifest, do the following:

  1. Locate your existing add-on manifest file. For example, for FIM, locate the fim.yml you uploaded to the Ops Manager VM.

  2. Modify the manifest to include following include directive to your manifest:

      include:
        stemcell:
          - os: ubuntu-xenial
    
  3. Upload the modified manifest file to your PCF deployment. For example instructions, see Create the FIM Manifest.

If you use any other BOSH-managed add-ons in your deployment, you should verify OS compatibility for those component as well. For more information about configuring BOSH add-on manifests, see the BOSH documentation.

Check Backup and Restore External Blobstore Add-On

If you have enabled external blobstore backups for an Azure Blobstore using the Blobstore Add-On, you must update your runtime configuration to remove the sdk-preview add-on before upgrading to PCF v2.4. If you do not remove this job, upgrading PAS fails with the error:

Preparing deployment: Preparing deployment (00:00:01)
  L Error: Colocated job 'azure-blobstore-backup-restorer' is already added to the instance group 'backup-restore'.

After removing this job from your runtime configuration, ensure that the Enable backup and restore checkbox is enabled in the PAS tile > File Storage pane. See External Azure Storage for instructions.

Check Certificate Authority Expiration Dates

Depending on the requirements of your deployment, you may need to rotate your Certificate Authority (CA) certificates. The non-configurable certificates in your deployment expire every two years. You must regenerate and rotate them so that critical components do not face a complete outage.

Note: PCF uses SHA-2 certificates and hashes by default. You can convert existing SHA-1 hashes into SHA-2 hashes by rotating your Ops Manager certificates using the procedure described in the Regenerating and Rotating Non-Configurable TLS/SSL Certificates section of Managing TLS Certificates.

To retrieve information about all the RSA and CA certificates for the BOSH Director and other products in your deployment, you can use the GET /api/v0/deployed/certificates?expires_within=TIME request of the Ops Manager API.

In this request, the expires_within parameter is optional. Valid values for the parameter are d for days, w for weeks, m for months, and y for years. For example, to search for certificates expiring within one month, replace TIME with 1m:

$ curl "https://OPS-MAN-FQDN/api/v0/deployed/certificates?expires_within=1m" \
 -X GET \
 -H "Authorization: Bearer UAA_ACCESS_TOKEN"

For information about regenerating and rotating CA certificates, see Managing TLS Certificates.

Check the Capacity of Your Deployment

The following sections describe steps for ensuring your deployment has adequate capacity to perform the upgrade.

Confirm Adequate Disk Space

Confirm that you have adequate disk space for your upgrades. You need at least 20 GB of free disk space to upgrade PCF Ops Manager and Pivotal Application Service. If you plan to upgrade other products, the amount of disk space required depends on how many tiles you plan to deploy to your upgraded PCF deployment.

To check current persistent disk usage, select the BOSH Director tile from the Installation Dashboard. Select Status and review the value of the PERS. DISK column. If persistent disk usage is higher than 50%, select Settings > Resource Config, and increase your persistent disk space to handle the size of the resources. If you do not know how much disk space to allocate, set the value to at least 100 GB.

Check Diego Cell RAM and Disk

Check that Diego cells have sufficient available RAM and disk capacity to support app containers.

The KPIs that monitor these these resources are are:

  • rep.CapacityRemainingMemory
  • rep.CapacityRemainingDisk

Adjust Diego Cell Limits

If needed, adjust the maximum number of Diego cells that the platform can upgrade simultaneously, to avoid overloading the other cells. See Limit PCF Component Instances During Restart.

For PCF v1.10 and later, the maximum number of cells that can update at once, max_in_flight is 4%. This setting is configured in the BOSH manifest’s Diego cell definition. See the Prevent Overload section for details.

Review the Diego Cell Metrics section of the KPI topic for more information about these KPIs.

Review File Storage IOPS and Other Upgrade Limiting Factors

During the PCF upgrade process, a large quantity of data is moved around on disk.

To ensure a successful upgrade of PCF, verify that your underlying PAS file storage is performant enough to handle the upgrade. For more information about the configurations to evaluate, see Upgrade Considerations for Selecting Pivotal Cloud Foundry Storage.

In addition to file storage IOPS, consider additional existing deployment factors that can impact overall upgrade duration and performance:

Factor Impact
Network latency Network latency can contribute to how long it takes to move app instance data to new containers.
Number of ASGs A large number of Application Security Groups in your deployment can contribute to an increase in app instance container startup time.
Number of app instances and application growth A large increase in the number of app instances and average droplet size since the initial deployment can increase the upgrade impact on your system.

To review example upgrade-related performance measurements of an existing production Cloud Foundry deployment, see the Pivotal Web Services Performance During Upgrade topic.

Run BOSH Clean-Up

Run bosh -e ALIAS clean-up --all to clean up old stemcells, releases, orphaned disks, and other resources before upgrade. This cleanup helps prevent the product and stemcell upload process from exceeding the BOSH Director’s available persistent disk space.

Check the Health of Your Deployment

The following sections describe steps for ensuring your deployment is healthy before you perform the upgrade.

Collect Foundation Health Status

For collecting foundation health status, Pivotal recommends PCF Healthwatch, which monitors and alerts on the current health, performance, and capacity of PCF. For more information, see the PCF Healthwatch documentation.

If you are not using PCF Healthwatch, you can do some or all of the following to collect foundation health status:

  • If your PCF deployment has external metrics monitoring set up, verify that VM CPU, RAM, and disk use levels are within reasonable levels.
  • Run BOSH CLI commands to check system status:
    • bosh -e ALIAS -d DEPLOYMENT_NAME instances --ps.

      bosh instances with the flags --ps, --vitals, or --failing highlights individual job failure.

    • bosh -e ALIAS vms --vitals. This reveals VMs with high CPU, high memory, high disk utilization, and with state != running.
    • bosh -e ALIAS -d DEPLOYMENT_NAME cck --report
  • Check Ops Manager GUI each PAS/Tiles the status page for CPU/RAM/DISK utilization
  • Validate Ops Manager persistent disk usage is below 50%. If not, follow the procedure in Confirm Adequate Disk Space.

(Optional) Check the logs for errors before proceeding with the upgrade. For more information, see Viewing Logs in the Command Line Interface.

Push and Scale a Test App

Check that a test app can be pushed and scaled horizontally, manually or through automated testing. This check ensures that the platform supports apps as expected before the upgrade.

Validate Installed Buildpacks

PAS includes a stack association feature for buildpacks. You can have multiple versions of the same buildpack for different stacks. For more information, see cflinuxfs3 Stack and Compatible Buildpacks in the PAS v2.3 Release Notes.

To avoid errors during upgrade, do the following:

For more information about associating a stack with an existing buildpack, see Managing Stack Association with the cf CLI.

Validate MySQL Cluster Health

If you are running PAS MySQL as a cluster, run the mysql-diag tool to validate health of the cluster.

See the BOSH CLI v2 instructions in the Running mysql-diag topic.

Review Pending and Recent Changes

  1. Confirm there are no outstanding changes in Ops Manager or any other tile. All tiles should be green. Click Review Pending Changes, then Apply Changes if necessary.

  2. After applying changes, click Recent Install Logs to confirm that the changes completed cleanly:

    Cleanup complete
    {"type": "step_finished", "id": "clean_up_bosh.cleaning_up"}
    Exited with 0.
    

Export Your Installation

To export your installation, do the following:

  1. In your Ops Manager v2.3 Installation Dashboard, click the account dropdown and select Settings.

    Upgrade to 1.9

  2. On the Settings screen, select Export Installation Settings from the left menu, then click Export Installation Settings.

    Export install settings

This exports the current PCF installation with all of its assets.

When you export an installation, the export contains the base VM images, necessary packages, and configuration settings, but does not include releases between upgrades if Ops Manager has already uploaded them to BOSH. When backing up PCF, you must take this into account by backing up the BOSH blobstore that contains the uploaded releases. BOSH Backup and Restore (BBR) backs up the BOSH blobstore. For more information, see Backing Up Pivotal Cloud Foundry with BBR.

  • The export time depends on the size of the exported file.
  • Some browsers do not provide feedback on the status of the export process and might appear to hang.

Note: Some operating systems automatically unzip the exported installation. If this occurs, create a ZIP file of the unzipped export. Do not start compressing at the “installation” folder level. Instead, start compressing at the level containing the config.yml file:

Compress

WARNING: If you fail to perform the remedial steps for this issue, this upgrade process may corrupt your existing usage data.

Next Steps

Now that you have completed the Upgrade Preparation Checklist for PCF v2.4, continue to Upgrading Pivotal Cloud Foundry.

Complete Survey

Please take some time to help us improve this document by completing the Upgrade Checklist Survey.

Create a pull request or raise an issue on the source for this page in GitHub