Upgrade Checklist for PCF v1.12

Note: in Pivotal Cloud Foundry (PCF) versions v2.0 and later, Elastic Runtime has been renamed Pivotal Application Service.

This topic serves as a checklist for upgrading Pivotal Cloud Foundry (PCF) from v1.11 to v1.12.

Tile Compatibility

Before you upgrade to PCF v1.12, please check whether PCF v1.12 supports the service tile versions that you currently have deployed.

To check PCF version support for any service tile, from Pivotal or a Pivotal partner:

  • Navigate to the tile’s download page on Pivotal Network.
  • Select the tile version in the Releases dropdown.
  • See the Depends On section under Release Details. For more information, refer to the tile’s release notes.

If the currently-deployed version of a tile is not compatible with PCF v1.12, you must upgrade the tile before you upgrade PCF. You do not need to upgrade tiles that are compatible with both PCF v1.11 and v1.12.

The Product Compatibility Matrix provides an overview of which PCF versions support which versions of the most popular service tiles from Pivotal.

When you have completed your upgrade please take some time to complete this survey.

Environment Details

Pivotal provides the empty table below as a model to print out or adapt for recording and tracking the tile versions that you have deployed in all of your environments.

Sandbox Non-Prod Prod Other…
Pivotal Cloud Foundry Ops Manager
Elastic Runtime
Pivotal Cloud Foundry Services MySQL v2
Redis
RabbitMQ
Single Sign On (SSO)
Spring Cloud Services
Concourse
Pivotal Cloud Foundry Partner Services New Relic

Elastic Runtime v1.12 Known Issues in the Pivotal Knowledge Base

PCF v1.12 Breaking Changes

PCF v1.12 has the following breaking changes. For more information, see PCF v1.12 Breaking Changes in the PCF Release Notes.

Component or Service Affected Product Details
Isolation Segment Elastic Runtime

This issue has been addressed in PCF v1.12.18 and later. If upgrading from PCF v1.11 to v1.12 please upgrade to at least PCF v1.12.18.

If you upgrade from PCF v1.11 to v1.12 and the PCF Isolation Segment tile is installed on your foundation, any apps deployed to a space associated with the tile may become unreachable until you manually restart them or map an arbitrary route to each of them.

This happens because the isolation segment metadata for your existing apps is not automatically sent to the Gorouter on upgrade. Once you restart a given app or map a route to it, the Gorouter will receive the updated route information, including the isolation segment metadata.

If PCF Isolation Segment is installed on your foundation, do not upgrade to PCF v1.12 until this issue is fixed.

Cloud Controller Bridge

Elastic Runtime

In PCF v1.12, the Enable secure communication between Diego and Cloud Controller option in the Cloud Controller pane of the Pivotal Elastic Runtime tile allows you to enable direct communications between the Cloud Controller and Diego over secure TLS and deactivate the Cloud Controller Bridge. If you deploy a fresh installation of PCF v1.12, the Enable checkbox is selected by default.

For upgrades, if you want to use this new feature, you must manually select the Enable Checkbox after the upgrade is complete and then click Apply Changes. Selecting the checkbox before the upgrade results in API downtime.

Gorouter and HAProxy TLS Configuration

Elastic Runtime

In PCF v1.12, Gorouter and HAProxy now always listen for TLS requests. Therefore, you must configure an SSL certificate for Gorouter and HAProxy in Elastic Runtime. You configure Gorouter and HAProxy using the same field and with the same certificate.

In addition, you must specify TLS cipher suites for both HAProxy and Gorouter. These cipher suites are specified independently in different fields. If you configured a previous installation with TLS cipher suites, these configurations persist through the upgrade. Make sure that you have configured the correct set of TLS cipher suites and minimum TLS version to support your client and load balancer needs.

Internal Elastic Runtime Service Credentials

Elastic Runtime

The internal credentials that the Elastic Runtime service uses for inter-component communication are now generated and stored in CredHub instead of Ops Manager. For a list of the credentials migrated to CredHub, see Pivotal Elastic Runtime Release Notes.

If you want to access these credentials, you must use the CredHub CLI or the Ops Manager API instead of the Credentials tab of the Pivotal Elastic Runtime tile.

Postgres Elastic Runtime

This release removes the legacy Postgres database VMs for the Cloud Controller and UAA. If your deployment was originally installed before PCF v1.6 and still uses Postgres, you must contact your dedicated Support Engineer or Platform Architect for assistance in migrating your Cloud Controller and UAA databases to MySQL. They have access to the PostgreSQL-to-MySQL Migrator tool and instructions on Pivotal Network.

If you do not migrate to MySQL before upgrading to Elastic Runtime v1.12, the upgrade fails. For more information, see Migrate the CC and UAA Databases from Postgres to MySQL.

MySQL for PCF and PCF Runtime for Windows Elastic Runtime

If your existing PCF v1.11.x installation includes both PCF Runtime for Windows and MySQL for PCF v1.x, you must upgrade to MySQL for PCF v1.10.3 or later before you upgrade to PCF Elastic Runtime v1.12. For instructions on how to upgrade MySQL for PCF, see the MySQL for PCF documentation.

If you do not upgrade MySQL for PCF, the upgrade fails. For more information, see Upgrade MySQL for PCF.

BOSH CLI v2 Ops Man Ops Manager v1.12.0 uses the BOSH Command Line Interface (CLI) v2. In v2, the formatting of the CLI output has changed. If your deployment uses scripts that rely on BOSH output, you must refactor them to interpret the command output of the BOSH CLI v2. For more information about the BOSH CLI v2, see Pivotal Operations Manager Release Notes.
Director Certificate Rotation Elastic Runtime If your original Elastic Runtime deployment was PCF v1.6 or earlier, you must regenerate the non-configurable Director certificates to deploy CredHub. During a deploy, CredHub attempts to verify the connection to UAA on the BOSH Director with the Ops Manager certificate Subject Alternative Name (SAN). Ops Manager 1.6 and earlier generated non-configurable certificate SANs in a format that CredHub does not understand. For more information, see CredHub Requires Director Certificate Rotation.

PCF Log Search

Ops Man

PCF Log Search is not compatible with PCF v1.12. If your deployment contains PCF Log Search, you must remove the product tile before upgrading to PCF v1.12. Failure to remove this product prior to the upgrade may cause issues with your deployment.

For more information, see the Upgrading Pivotal Cloud Foundry topic.

Before Upgrade

Step Note/Reason Product/component

Ensure all backups have been carried out.

Pivotal recommends frequently backing up your installation settings before making any changes to your PCF deployment.

See Backing Up and Restoring Pivotal Cloud Foundry.

PCF

Read Upgrading Pivotal Cloud Foundry for upgrading to v1.12 and complete all relevant actions.

Please review all of the upgrade notes.

  • If your deployment contains PCF Log Search, you must remove the product tile before upgrading. Failure to remove this product prior to upgrade may cause issues with your deployment.
  • Ensure you have lifecycle errands enabled for your products.
  • If your existing PCF v1.11.x installation includes both PCF Runtime for Windows and MySQL for PCF v1.x, you must upgrade to MySQL for PCF v1.10.3 or later before you upgrade to PCF Pivotal Application Service v1.12.
  • Ensure that your persistent disk usage is < 50%.
  • Ensure that you have your Decryption Passphrase stored and known before upgrading.
PCF

Read and review Pivotal Elastic Runtime v1.12 Release Notes

Review all release notes for any feature changes that may affect you during or after upgrade.

  • You must upgrade first to a version of Pivotal Application Service v1.11.x to successfully upgrade to v1.12.
  • Releases 1.12.10 and 1.12.11 introduce a bug that causes BBR backups to fail due to a missing default domain in the mysql-backup certificate. We recommend skipping this release and upgrading to 1.11.24 or higher, which resolves this issue. More information can be found in the Knowledge Base article Pivotal Application Service Backup and Restore Fails due to Missing Streaming mysql-backup-tool Domain.
  • Review the new features available in Pivotal Elastic Runtime v1.12.
  • Internal credentials, the secret and simple_credentials that Pivotal Application Service uses for inter-component communication, are now generated and stored in CredHub instead of Ops Manager.
  • This release removes the etcd server VMs from the PCF deployment. Operators must ensure they are deploying service tiles that are known to be compatible with PCF Elastic Runtime 1.12.
Elastic Runtime

Read and review PCF Ops Manager v1.12 Release Notes

Review all release notes for any feature changes that may affect you during or after upgrade.

  • BOSH CLI output formatting has changed. If your deployment uses scripts that rely on BOSH output, you must refactor them to interpret BOSH CLI v2 command output.
Ops Manager

Read and review the Known Issues for Elastic Runtime v1.12.

This doc covers recommended actions, new issues and existing issues for Elastic Runtime v1.12.

Some of these issues include:

Elastic Runtime

Read and review the Known Issues for Ops Manager v1.12.

This document covers recommended actions, new issues and existing issues for Ops Manager v1.12.

Some of these issues include:

  • If your deployment contains PCF Log Search, you must remove the product tile before upgrading. Failure to remove this product prior to upgrade may cause issues with your deployment.
  • If you use any service tile that offers both on-demand and not on-demand modes of operation, clicking Apply Changes in Ops Manager fails if you did not define a dedicated service network for the tile.
  • CF Ops assumed that the Ops Manager installation settings artifact contained all necessary releases, which is no longer the case in PCF v1.12. CFOps should not be used to back up and restore PCF v1.12.
Ops Manager

Read and review the Breaking Changes for Ops Manager and Elastic Runtime v1.12.

This topic describes the breaking changes you need to be aware of when upgrading to Pivotal Cloud Foundry v1.12.

Some of these breaking changes include:

  • Gorouter and HAProxy now always listen for TLS requests. Therefore, you must configure an SSL certificate for Gorouter and HAProxy in Pivotal Application Service.
  • Ops Manager v1.12.0 uses the BOSH Command Line Interface (CLI) v2. In v2, the formatting of the CLI output has changed. If your deployment uses scripts that rely on BOSH output, you must refactor them to interpret the command output of the BOSH CLI v2.
PCF
Review the ports listed in Diego Network Communications. This topic describes Diego internal network communication paths with other Elastic Runtime components. PCF

Run and collect foundation health status, using one of the following methods:

  • If external metrics monitoring has been enabled, verify all VM CPU/RAM/DISK utilization is good.
  • Run BOSH CLI directly to verify the status:
    • bosh vms --details
    • bosh vms --vitals
    • bosh2 -e ENV_NAME -d DEPLOYMENT_NAME cck
    • bosh2 -e ENV_NAME -d DEPLOYMENT_NAME instances --ps
  • Using the Ops Manager GUI, check the PAS and service tile status pages and verify CPU/RAM/DISK utilization is good.
  • Review the PAS KPIs to ensure that the foundation is healthy.

To ensure that the foundation is in a healthy status before upgrading.

  • bosh vms --vitals to highlight any VMs with high CPU, high memory, high disk utilization. Also to identify where State != running.
  • bosh2 -e ENV_NAME -d DEPLOYMENT_NAME instances (--ps or --vitals or --failing) to highlight individual job failure.
PCF

Review the KPI Changes from PCF v1.11 to v1.12.

This document highlights new and changed Key Performance Indicators (KPIs) that operators may want to monitor with their PCF deployment to help ensure it is in a good operational state. PCF

Migrate your Cloud Controller and UAA databases from Postgres to MySQL if needed.

Prior to PCF v1.6, Postgres was the default database for Cloud Controller and UAA. PCF v1.6 introduced MySQL as the default database and PCF v1.12 removes the legacy Postgres database VMs.

If your deployment was originally installed before PCF v1.6 and still uses Postgres, you must contact your dedicated Support Engineer or Platform Architect for assistance migrating your Cloud Controller and UAA databases to MySQL. They will have access to the PostgreSQL-to-MySQL Migrator tool and instructions on Pivotal Network.

If you do not migrate to MySQL before upgrading to PCF v1.12, the upgrade will fail during the deployment of the Pivotal Elastic Runtime tile.

See Prep 1: Migrate the CC and UAA Databases from Postgres to MySQL.

Elastic Runtime

Check the remaining Diego Cell RAM and Disk capacity to verify there is sufficient resources for app containers.

The KPIs that monitor these values are:

rep.CapacityRemainingMemory

rep.CapacityRemainingDisk

See Diego Cell Metrics for more information on these KPIs.

PCF

Verify that Apps Manager is working properly.

Verify that Apps Manager is working by logging into Apps Manager successfully and verify that you can navigate to a Space. PCF
Check that a test app can be pushed and scaled horizontally via manual or automatic test. To ensure the platform is working as expected before upgrade in relation to pushing an app. PCF
If needed, change the Diego cell max inflight value from its default. For PCF v1.10 and later, the default maximum in-flight Diego cell instances is 4%. The max_in_flight setting for the Diego cell job is configured in the BOSH manifest. See Preventing Overload for details and instructions. PCF

If needed, disable unused Elastic Runtime post-deploy errands if not needed.

  • Volume Services
  • Go to Advanced Features and make sure that NFSv3 Volume Services is disabled. Go to Errands and set the NFS Broker Errand to off
  • Notification Service
  • Notification UI

See Post-Deploy Errands for more information.

Only disable these errands if they are not needed for your environment.

Elastic Runtime

Update the overlay subnet if you need to avoid address collision with a third-party service on the same subnet.

Starting v1.12 Elastic Runtime, the internal container network overlay subnet changed from 10.254.0.0/22 to 10.255.0.0/16.

You can update the overlay subnet if needed by going to Elastic Runtime > Networking > Overlay Subnet.

PCF

If MySQL is being used with cluster setup, run the mysql-diag tool to validate health of the cluster.

See Run mysql-diag Using BOSH CLI v2.

Validate the health of MySQL before upgrading. PCF
(Optional) BOSH Add-on

Check IPSec between two VMs. See Troubleshooting the IPsec Add-on for PCF.

To validate IPSec is working as expected before upgrade. IPSec
(Optional) Network Firewall If your PCF foundation has strict network ingress/egress policy for multiple AZ subnets.
New Ports in PCF v1.12: A set of ports (4003 and 4103) for Diego BBS.

There are a new set of ports (4003 and 4103) for Diego BBS.

Run bosh cleanup --all to clean up old stemcells/releases/disks for one version upgrade. This step avoids filling the BOSH Director persistent disk during the new product stemcell/releases upload. Tiles

During Upgrade

Step Note/Reason Product/Component
(Optional) Periodically take snapshots of storage metrics. This step is advised if you have a large foundation and you have experienced storage issues in the past. PCF

Periodically Monitor progress:

  • bosh task TASK_NUMBER
  • App Availability
  • CF CLI Commands
  • Availability of Ops Manager GUI
  • NAS performance (if using NAS)
  • vSphere Performance (if on vSphere)
  • bosh vms --vitals, bosh vms --details, and bosh instances --ps
Monitor the progress of the upgrade, checking the status of the foundation at various locations. PCF

Check instance count by state of Diego Components using cfdot.

  1. bosh ssh into a Diego job
  2. Run the command cfdot actual-lrp-groups | jq '.instance, .evacuating | values' | jq -s -r 'group_by(.state)[] | .[0].state + ": " + (length | tostring)'

See the cfdot repo for details regarding cfdot and its uses.

Elastic Runtime

(Optional) In the event there are issues with the upgrade, perform the following tasks.

  • Collect all job logs
  • Collect task debug logs for tasks of VM upgrades
  • Collect installation log from Ops Manager
This information will be helpful in determining the cause of upgrade issues. PCF

After Upgrade

Step Note/Reason Product/Component
After upgrading to v1.12 complete the steps outlined in upgrading PCF v1.12 document here. Follow the steps outlined in the “After You Upgrade” section. All
Perform performance test by pushing and scaling a test application. Push and scale an app, and verify that it works. PCF

Run and collect health-check stats using the following commands:

  • bosh vms --vitals
  • bosh2 ENV_NAME -d DEPLOYMENT_NAME instances --ps

To ensure that all Jobs and process are running as expected.

  • bosh vms --vitals to highlight any VMs with high CPU, high memory, high disk utilization. Also to identify where State != running.
  • bosh2 -e ENV_NAME -d DEPLOYMENT_NAME instances (--ps or --vitals or --failing) to highlight individual job failure.
PCF and all Tiles
If you have added custom VM_TYPE or PERSISTENCE DISK TYPES, ensure that these values are correctly set and didn’t get overwritten. Verify that values are retained in Ops Manager UI. PCF

If MySQL is being used with cluster setup, run the mysql-diag tool to validate health of the cluster.

See Run mysql-diag Using BOSH CLI v2.

Validate the health of MySQL after upgrading. Elastic Runtime
Run bosh2 -e ENV_NAME clean-up --all to clean up old stemcells/releases/disks for one version upgrade. Cleans up releases, stemcells, orphaned disks, and other unused resources. Tiles

Survey

Please take some time to help us improve this document by completing the Upgrade Checklist Survey.

Create a pull request or raise an issue on the source for this page in GitHub