Upgrade Preparation Checklist for Ops Manager v2.10

Page last updated:

This topic serves as a checklist for preparing to upgrade Ops Manager and VMware Tanzu Application Service for VMs (TAS for VMs) from v2.9 to v2.10.

Overview

This topic contains important preparation steps that you must follow before beginning your upgrade. Failure to follow these instructions may jeopardize your existing deployment data and cause the upgrade to fail.

After completing the steps in this topic, you can continue to Upgrading Ops Manager.

VMware recommends upgrading directly to Ops Manager v2.10 from Ops Manager v2.4 through v2.8. To upgrade directly to Ops Manager v2.10 from Ops Manager v2.4 through v2.8, see Jump Upgrade from Ops Manager v2.4 Through v2.8 in Upgrading Ops Manager.

Warning: Although you can skip minor versions when upgrading Ops Manager, you should not skip minor versions when upgrading TAS for VMs. Skipping minor versions when upgrading TAS for VMs may result in additional breaking changes. To avoid this, upgrade TAS for VMs to the minor version that directly follows your current version of TAS for VMs. For more information, see Upgrade TAS for VMs in Upgrading Ops Manager.

Back Up Your Ops Manager Deployment

VMware recommends backing up your Ops Manager deployment before upgrading, to restore in the case of failure. To do this, follow the instructions in Backing Up Deployments with BBR.

Find Your Decryption Passphrase for Ops Manager

To complete the Ops Manager upgrade, you must have your Ops Manager decryption passphrase. You defined this decryption passphrase during the initial installation of Ops Manager.

Review Changes in Ops Manager and TAS for VMs v2.10

Review each of the following links to understand the changes in the new release, such as new features, known issues, and breaking changes.

Check VMware NSX-T Data Center Version

For vSphere deployments, the vSphere BOSH CPI that is included in Ops Manager v2.10 no longer supports NSX-T v2.2 and earlier. If you are using NSX-T v2.2 or earlier, you must upgrade NSX-T to a supported version.

The specific version of NSX-T required for your deployment depends on which runtime you plan to deploy and integrate with NSX-T.

Update Tiles and Add-Ons

These sections describe changes you must make to your product tiles and add-ons before upgrading Ops Manager.

Review Service Tile Compatibility

Before you upgrade, check whether the service tiles that you currently have are compatible with the new version of Ops Manager.

To check all the service tiles in your current Ops Manager deployment, use Upgrade Planner.

For information about how to use Upgrade Planner, see the Upgrade Planner documentation.

Alternatively, you can do the following:

  1. Navigate to the tile’s download page on VMware Tanzu Network.

  2. Select the tile version in the Releases dropdown.

  3. See the Depends On section under Release Details. For more information, see the tile’s release notes.

If the currently-deployed version of a tile is not compatible with Ops Manager v2.10, you must upgrade the tile to a compatible version before you upgrade Ops Manager. You do not need to upgrade tiles that are compatible with both Ops Manager v2.9 and v2.10.

Some partner service tiles may be incompatible with Ops Manager v2.10. For information about partner service tile compatibility, review the Depends On section on your partner tile download page, see the tile release documentation in the VMware Tanzu documentation, or contact the partner organization that produces the service tile.

Environment Details

You can use the empty table below as a template for recording and tracking the tile versions that you have deployed in all of your environments.

Sandbox Non-Prod Prod Other…
Ops Manager Ops Manager
VMware Tanzu Application Service for VMs
Ops Manager Services VMware Tanzu SQL [MySQL for VMs]
Redis
RabbitMQ for VMware Tanzu [VMs]
Single Sign-On for VMware Tanzu (SSO)
Spring Cloud Services for VMware Tanzu
Concourse
Ops Manager Partner Services New Relic

Upgrade Services Tiles

Upgrade all service tiles to versions that are compatible with Ops Manager v2.10. Service tiles are add-on products you install alongside your runtime. For example, VMware Tanzu SQL [MySQL for VMs], Healthwatch, and RabbitMQ for VMware Tanzu [VMs] are service tiles.

Do not upgrade runtime tiles, such as TAS for VMs, TAS for VMs [Windows], or VMware Tanzu Kubernetes Grid Integrated Edition (TKGI) at this time.

To verify version compatibility, see Upgrade Planner and the service tile documentation.

Configure BOSH Director

With each release of a new Ops Manager version, BOSH Director may require specific updates before upgrading to the new version. For actions to take before upgrading to Ops Manager v2.10, see the sections below.

Check Required Machine Specifications

Check the required machine specifications for Ops Manager v2.10. These specifications are specific to your IaaS. If these specifications do not match your existing Ops Manager deployment, modify the values of your Ops Manager VM instance. For example, if the boot disk of your existing Ops Manager deployment is 50 GB and the new Ops Manager deployment requires 100 GB, increase the size of your Ops Manager boot disk to 100 GB.

If you use Azure, review your VM type setting. You can use either Generation 1 or Generation 2 as your default VM type. For more information, see Azure Generation 2 VM Types in Configuring BOSH Director on Azure Manually.

Configure TAS for VMs

With each release of a new Ops Manager version, TAS for VMs may require specific updates before upgrading to the new version. See the following sections for what action to take before upgrading to Ops Manager v2.10:

Disable Hostname Validation for External Databases on GCP and Azure

This pre-upgrade step applies only to existing TAS for VMs v2.9 deployments where both of these conditions are met:

  • In the Databases pane, TAS for VMs v2.9 is configured to use an external GCP or Azure database.

  • In TAS for VMs v2.10, you want to use the same external GCP or Azure database configured in the Databases pane.

  • You enabled TLS communication for the GCP or Azure external database by adding a certificate authority (CA) certificate to the Database CA certificate field in the Databases pane.

If you meet these conditions, you must disable hostname validation before you upgrade to TAS for VMs v2.10. Failure to disable hostname validation can cause the upgrade to fail for deployments that use external databases on GCP or Azure.

To disable hostname validation:

  1. After you stage TAS for VMs v2.10 for upgrade, go to the Databases pane in the TAS for VMs tile.

  2. Deselect the Enable hostname verification checkbox. This checkbox is selected by default.

For more information about database configuration in TAS for VMs v2.10, see Configure Databases in Configuring TAS for VMs in the TAS for VMs documentation.

(Optional) Disable Unused Errands

To save upgrade time, you can disable unused TAS for VMs post-deploy errands. For more information, see Post-Deploy Errands in Errands. Only disable these errands if your environment does not need them.

In some cases, if you have previously disabled lifecycle errands for any installed product to reduce deployment time, you may want to re-enable these errands before upgrading. For more information, see Add and Import Products in Adding and Deleting Products.

Check OS Compatibility of BOSH-Managed Add-Ons and Tiles

Before upgrading to Ops Manager v2.10, operators who have deployed any BOSH-managed add-ons such as IPsec, Anti-Virus, or File Integrity Monitoring and who have deployed or are planning to deploy TAS for VMs [Windows] must modify the add-on manifest to specify a compatible OS stemcell. For more information, see the TAS for VMs [Windows] documentation.

For example, File Integrity Monitor (FIM) is not supported on Windows. Therefore, the manifest must use an include directive to specify the target OS stemcell of ubuntu-trusty and ubuntu-xenial.

Note: To upgrade to a Xenial stemcell, see the documentation for each add-on and follow the instructions.

To update an add-on manifest:

  1. Locate your existing add-on manifest file. For example, for FIM, locate the fim.yml you uploaded to the Ops Manager VM.

  2. Modify the manifest to include following include directive to your manifest:

      include:
        stemcell:
          - os: ubuntu-xenial
    
  3. Upload the modified manifest file to your Ops Manager deployment. For example instructions, see Installing File Integrity Monitoring on BOSH Director in the File Integrity Monitoring documentation.

If you use any other BOSH-managed add-ons in your deployment, you should verify OS compatibility for those component as well. For more information about configuring BOSH add-on manifests, see Addons Block in Director Runtime Config in the BOSH documentation.

Check Backup and Restore External Blobstore Add-On

If you have enabled external blobstore backups for an Azure Blobstore using the blobstore add-on, you must update your runtime configuration to remove the sdk-preview add-on before upgrading. If you do not remove this job, upgrading TAS for VMs fails with the error below:

Preparing deployment: Preparing deployment (00:00:01)
  L Error: Colocated job 'azure-blobstore-backup-restorer' is already added to the instance group 'backup-restore'.

After removing this job from your runtime configuration, ensure that the Enable backup and restore checkbox is enabled in the File Storage pane of the TAS for VMs tile.

For more information, see Azure Blobstores in Enabling External Blobstore Backups.

Check Certificate Authority Expiration Dates

Depending on the requirements of your deployment, you may need to rotate your certificate authority (CA) certificates. The non-configurable certificates in your deployment expire every two years. You must regenerate and rotate them so that critical components do not face a complete outage.

Note: Ops Manager uses SHA-2 certificates and hashes by default. You can convert existing SHA-1 hashes into SHA-2 hashes by rotating your Ops Manager certificates using the procedure described in Rotating Identity Provider SAML Certificates.

To retrieve information about all the RSA and CA certificates for the BOSH Director and other products in your deployment, you can run GET /api/v0/deployed/certificates?expires_within=TIME using the Ops Manager API.

In this request, the expires_within parameter is optional. Valid values for the parameter are d for days, w for weeks, m for months, and y for years. For example, to search for certificates expiring within one month, run:

curl "https://OPS-MANAGER-FQDN/api/v0/deployed/certificates?expires_within=1m" \
 -X GET \
 -H "Authorization: Bearer UAA-ACCESS-TOKEN"

Where:

  • OPS-MANAGER-FQDN is the fully-qualified domain name (FQDN) of your Ops Manager deployment.
  • UAA-ACCESS-TOKEN is your UAA access token.

For information about regenerating and rotating CA certificates, see Overview of Certificate Rotation.

Check the Capacity of Your Deployment

These sections describe steps for ensuring your deployment has adequate capacity to perform the upgrade.

Confirm Adequate Disk Space

Confirm that the BOSH Director VM has adequate disk space for your upgrades. You need at least 20 GB of free disk space to upgrade Ops Manager and TAS for VMs. If you plan to upgrade other products, the amount of disk space required depends on how many tiles you plan to deploy to your upgraded Ops Manager deployment.

To check current persistent disk usage:

  1. Navigate to the Ops Manager Installation Dashboard.

  2. Click the BOSH Director tile.

  3. Select the Status tab.

  4. Check the value of the PERSISTENT DISK TYPE column. If persistent disk usage is higher than 50%:

    1. Select the Settings tab.
    2. Select Resource Config.
    3. Increase your persistent disk space to handle the size of the resources. If you do not know how much disk space to allocate, set the value to at least 100 GB.

Check Diego Cell RAM and Disk

Check that Diego Cells have sufficient available RAM and disk capacity to support app containers.

The KPIs that monitor these these resources are are:

  • rep.CapacityRemainingMemory
  • rep.CapacityRemainingDisk

Adjust Diego Cell Limits

If needed, adjust the maximum number of Diego Cells that the platform can upgrade simultaneously, to avoid overloading the other Diego Cells. For more information, see Limit Component Instance Restarts in Configuring TAS for VMs for Upgrades in the TAS for VMs documentation.

The maximum number of Diego Cells that can update at once, max_in_flight, is 4%. This setting is configured in the BOSH manifest’s Diego Cell definition. For more information, see Prevent Overload in Configuring TAS for VMs for Upgrades in the TAS for VMs documentation.

For more information about these KPIs, see Diego Cell Metrics in Key Performance Indicators in the TAS for VMs documentation.

Review File Storage IOPS and Other Upgrade Limiting Factors

During the Ops Manager upgrade process, a large quantity of data is moved around on disk.

To ensure a successful upgrade of your deployment, verify that your underlying TAS for VMs file storage is performant enough to handle the upgrade. For more information about the configurations to evaluate, see Configure File Storage in Configuring TAS for VMs for Upgrades in the TAS for VMs documentation.

In addition to file storage IOPS, consider additional existing deployment factors that can impact overall upgrade duration and performance:

Factor Impact
Network latency Network latency can contribute to how long it takes to move app instance data to new containers.
Number of ASGs A large number of App Security Groups (ASGs) in your deployment can contribute to an increase in app instance container startup time. For more information, see App Security Groups in the TAS for VMs documentation.
Number of app instances and app growth A large increase in the number of app instances and average droplet size since the initial deployment can increase the upgrade impact on your system.

For example upgrade-related performance measurements of an existing production deployment, see Upgrade Load Example: Pivotal Web Services During Upgrade in the TAS for VMs documentation.

Run BOSH Clean-Up

To clean up old stemcells, releases, orphaned disks, and other resources before upgrade:

  1. Run:

    bosh -e ALIAS clean-up --all
    

    Where ALIAS is your BOSH deployment alias.

This cleanup helps prevent the product and stemcell upload process from exceeding the BOSH Director’s available persistent disk space.

Check the Health of Your Deployment

These sections describe steps for ensuring your deployment is healthy before you perform the upgrade.

Collect Foundation Health Status

For collecting foundation health status, VMware recommends Healthwatch, which monitors and alerts on the current health, performance, and capacity of your Ops Manager deployment. For more information, see the Healthwatch documentation.

If you are not using Healthwatch, you can do some or all of the following to collect foundation health status:

  • If your Ops Manager deployment has external metrics monitoring set up, verify that VM CPU, RAM, and disk use levels are within reasonable levels.

  • Check your system status.

    • To check the status of your BOSH instances, run:

      bosh -e ALIAS -d DEPLOYMENT-NAME instances --ps
      

      Where:

      • ALIAS is your BOSH deployment alias.
      • DEPLOYMENT-NAME is the name of the BOSH deployment with the instances you want to check.

        Running bosh instances with the flags --ps, --vitals, or --failing reveals individual job failure.
    • To check the status of your BOSH VMs, run:

      bosh -e ALIAS vms --vitals
      

      Where ALIAS is your BOSH deployment alias.

      This command reveals VMs with high CPU, high memory, high disk utilization, and with a state other than running.

    • To check the status of your BOSH cloud config, run:

      bosh -e ALIAS -d DEPLOYMENT-NAME cck --report
      

      Where:

      • ALIAS is your BOSH deployment alias.
      • DEPLOYMENT-NAME is the name of the BOSH deployment with the cloud config you want to check.
  • Check the Status tab of each TAS for VMs tile for VM CPU, RAM, and disk use levels.

  • Check that Ops Manager persistent disk usage is below 50%. If not, follow the procedure in Confirm Adequate Disk Space above.

  • (Optional) Check the logs for errors before proceeding with the upgrade. For more information, see Viewing Logs in the Command Line Interface in App Logging in TAS for VMs in the TAS for VMs documentation.

Push and Scale a Test App

Check that a test app can be pushed and scaled horizontally, manually, or through automated testing. This check ensures that the platform supports apps as expected before the upgrade.

Validate MySQL Cluster Health

If you are running TAS for VMs MySQL as a cluster, run the mysql-diag tool to validate health of the cluster.

For BOSH CLI v2 instructions, see Running mysql-diag in the TAS for VMs documentation.

Review Pending and Recent Changes

To review pending and recent changes:

  1. Confirm there are no outstanding changes in Ops Manager or any other tile. All tiles should be green. If all tiles are not green, click Review Pending Changes, then click Apply Changes.

  2. After applying changes, click Recent Install Logs to confirm that the changes completed cleanly. You should see the following output:

    Cleanup complete
    {"type": "step_finished", "id": "clean_up_bosh.cleaning_up"}
    Exited with 0.
    

Export Your Installation

To export your installation:

  1. In the Ops Manager Installation Dashboard, click the account dropdown and select Settings.

    Upgrade to 1.9

  2. On the Settings page, select Export Installation Settings from the left menu, then click Export Installation Settings.

This exports the current Ops Manager installation with all of its assets.

When you export an installation, the export contains the base VM images, necessary packages, and configuration settings, but does not include releases between upgrades if Ops Manager has already uploaded them to BOSH. When backing up your deployment, you must take this into account by backing up the BOSH blobstore that contains the uploaded releases. BOSH Backup and Restore (BBR) backs up the BOSH blobstore. For more information, see Backing Up Deployments with BBR.

  • The export time depends on the size of the exported file.

  • Some browsers do not provide feedback on the status of the export process and might appear to hang.

Note: Some operating systems automatically unzip the exported installation. If this occurs, create a ZIP file of the unzipped export. Do not start compressing at the “installation” folder level. Instead, start compressing at the level containing the rails_database_dump.postgres file, as shown in the example image below:

Compress

Next Steps

Now that you have completed the Upgrade Preparation Checklist, continue to Upgrading Ops Manager.