Rolling Back ERT Deployment to an Earlier Backup with BBR

Page last updated:

This topic describes the procedure for restoring an Elastic Runtime (ERT) deployment to a backup from a previous version with BOSH Backup and Restore (BBR), a command-line tool for backing up and restoring BOSH deployments. To perform the procedures in this topic, you must have backed up Pivotal Cloud Foundry (PCF) by following the steps in the Backing Up Pivotal Cloud Foundry with BBR topic.

To view the BBR release notes, see the BOSH Backup and Restore Release Notes. To restore PCF manually, see the Restoring Pivotal Cloud Foundry Manually from Backup topic.

The procedures described in this topic prepare your environment for PCF, deploy Ops Manager, import your installation settings, and use BBR to restore your PCF components.

Warning: Restoring Pivotal Cloud Foundry (PCF) with BBR is a destructive operation. If the restore fails, the environment may be left in an unusable state and require reprovisioning. Only perform the procedures in this topic for the purpose of recovering from a failed ERT upgrade.

Note: BBR is a feature in PCF v1.11. You can only use BBR to back up PCF v1.11 and later. To restore earlier versions of PCF, perform the manual procedures.

Step 1: Delete Existing ERT Deployment

  1. Navigate to the Ops Manager Installation Dashboard.

  2. Delete the Pivotal Elastic Runtime tile by clicking the trash icon on the tile.

  3. Click Apply Changes.

Step 2: Redeploy Ops Manager and Import Installation Settings

  1. Destroy your Ops Manager VM.

  2. Perform the procedures for your IaaS to re-deploy Ops Manager:

  3. Access your new Ops Manager by navigating to YOUR-OPS-MAN-FQDN in a browser.

  4. On the Welcome to Ops Manager page, click Import Existing Installation.

    Welcome

  5. In the import panel, perform the following tasks:

    • Enter your Decryption Passphrase.
    • Click Choose File and browse to the installation zip file that you exported in the Step 3: Export Installation Settings section of the Backing Up Pivotal Cloud Foundry with BBR topic.

    Decryption passphrase

  6. Click Import.

    Note: Some browsers do not provide feedback on the status of the import process, and may appear to hang.

  7. A Successfully imported installation message appears upon completion.

    Success

Step 3: Redeploy BOSH Director

Perform the steps in the Applying Changes to Ops Manager Director topic to use the Ops Manager API to only deploy the BOSH Director.

Note: This step will not perform a re-deploy as the BOSH Director still exists. It is only required to recreate the root_ca_certificate file on the Ops Manager VM after it is restored.

Step 4: Transfer Artifacts to Jumpbox

In the Step 9: Back Up Your Deployment section of the Backing Up Pivotal Cloud Foundry with BBR topic, you moved the TAR and metadata files of the backup artifact off your jumpbox to your preferred storage space. Now you must transfer those files back to your jumpbox.

For instance, you could SCP the backup artifact to your jumpbox:

$ scp LOCAL_PATH_TO_BACKUP_ARTIFACT JUMPBOX_USER/JUMPBOX_ADDRESS

Step 5: Retrieve BOSH Director Address and Credentials

You can retrieve the IP address of your BOSH Director and the credentials for logging in either of two ways:

  • Option 1: from the Ops Manager Director tile
  • Option 2: by using the Ops Manager API

Option 1: Retrieve the Information from Ops Manager Director

Perform the following steps to retrieve the IP address of your BOSH Director and the credentials for logging in from the Ops Manager Director tile:

  1. Install the BOSH v2 CLI on a machine outside of your PCF deployment.
  2. From the Installation Dashboard in Ops Manager, select Ops Manager Director > Status and record the IP address listed for the Director. You access the BOSH Director using this IP address.

  3. Click Credentials and record the Director credentials.
  4. From the command line, log into the BOSH Director using the IP address and credentials that you recorded:
    $ bosh -e DIRECTOR_IP \
    --ca-cert /var/tempest/workspaces/default/root_ca_certificate log-in
    Email (): director
    Password (): *******************
    Successfully authenticated with UAA
    Succeeded
    

Option 2: Retrieve the Information from the Ops Manager API

Perform the following steps to retrieve the IP address of your BOSH Director and the credentials for logging in from the Ops Manager API:

  1. Install the BOSH v2 CLI on a machine outside of your PCF deployment.
  2. Perform the procedures in the Using the Ops Manager API topic to authenticate and access the Ops Manager API.
  3. Use the GET /api/v0/deployed/products endpoint to retrieve a list of deployed products, replacing UAA-ACCESS-TOKEN with the access token recorded in the Using the Ops Manager API topic:
    $ curl "https://OPS-MAN-FQDN/api/v0/deployed/products" \ 
    -X GET \ 
    -H "Authorization: Bearer UAA-ACCESS-TOKEN"
  4. In the response to the above request, locate the product with an installation_name starting with p-bosh- and copy its guid.
  5. Run the following curl command, replacing PRODUCT-GUID with the value of guid from the previous step:
    $ curl "https://OPS-MAN-FQDN/api/v0/deployed/products/PRODUCT-GUID/static_ips" \ 
    -X GET \
    -H "Authorization: Bearer UAA-ACCESS-TOKEN"
  6. In the response to the above request, locate the BOSH Director IP address under the ips field.
  7. Run the following curl command to retrieve the BOSH Director credentials:
    $ curl "https://OPS-MAN-FQDN/api/v0/deployed/director/credentials/director_credentials" \ 
    -X GET \
    -H "Authorization: Bearer UAA-ACCESS-TOKEN"
  8. From the command line, log into the BOSH Director using the IP address and credentials that you recorded:
    $ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate log-in
    Email (): director
    Password (): *******************
    Successfully authenticated with UAA
    Succeeded
    

Step 6: Restore the BOSH Director

  1. Navigate to the Ops Manager Installation Dashboard.
  2. Click the Ops Manager tile.
  3. Click the Credentials tab.
  4. Locate Bbr Ssh Credentials and click Link to Credential next to it.

    You can also retrieve the credentials using the Ops Manager API with a GET request to the following endpoint: /api/v0/deployed/director/credentials/bbr_ssh_credentials. For more information, see the Using the Ops Manager API topic.

  5. Copy the value for private_key_pem, beginning with "-----BEGIN RSA PRIVATE KEY-----".

  6. SSH into your jumpbox.

  7. Run the following command to reformat the key and save it to a file called PRIVATE_KEY in the current directory, pasting in the contents of your private key for YOUR_PRIVATE_KEY:

    $ printf -- "YOUR_PRIVATE_KEY" > PRIVATE_KEY
    

  8. Ensure the BOSH Director backup artifact is in the folder you will run BBR from.

  9. Run the BBR restore command from your jumpbox to restore the BOSH Director:

    $ nohup bbr director \
      --private-key-path PRIVATE_KEY \
      --username bbr \
      --host HOST \
      restore \
        --artifact-path PATH_TO_DIRECTOR_BACKUP
    
    Use the optional --debug flag to enable debug logs. See Logging for more information.

    Replace the placeholder values as follows:

    • PATH_TO_DIRECTOR_BACKUP: This is the path to the Director backup you want to restore.
    • PRIVATE_KEY: This is the path to the private key file you created above.
    • HOST: This is the address of the BOSH Director. If the BOSH Director is public, this will be a URL, such as https://my-bosh.xxx.cf-app.com. Otherwise, it will be the BOSH_DIRECTOR_IP, which you retrieved in Step 5: Retrieve BOSH Director Address and Credentials.

Note: The BBR restore command can take a long time to complete. Pivotal recommends you run it independently of the SSH session, so that the process can continue running even if your connection to the jumpbox fails. The command above uses nohup but you could also run the command in a screen or tmux session.

If the command fails, do the following:

  1. Ensure all the parameters in the command are set.
  2. Ensure the BOSH Director credentials are valid.
  3. Ensure the specified deployment exists.
  4. Ensure the source deployment is compatible with the target deployment.
  5. Ensure that the jumpbox can reach the BOSH Director.

Step 7: Identify Your Deployment

After logging in to your BOSH Director, run bosh deployments to identify the name of the BOSH deployment that contains PCF:

$ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate deployments

Name                     Release(s)
cf-example               push-apps-manager-release/661.1.24
                         cf-backup-and-restore/0.0.1
                         binary-buildpack/1.0.11
                         capi/1.28.0
                         cf-autoscaling/91
                         cf-mysql/35
                         ...

In the above example, the name of the BOSH deployment that contains PCF is cf-example.

Step 8: Remove Stale Cloud IDs for All Deployments

For every deployment in the BOSH Director, run the following command:

$ bosh -e DIRECTOR_IP -d DEPLOYMENT_NAME -n cck \
  --resolution delete_disk_reference \
  --resolution delete_vm_reference

This reconciles the BOSH Director’s internal state with the state in the IaaS. You can use the list of deployments returned in Step 7: Identify Your Deployment.

If the bosh cck command does not successfully delete disk references and you see a message similar to the following, perform the additional procedures in Remove Unused Disks below.

Scanning 19 persistent disks: 19 OK, 0 missing ...

Step 9: Redeploy Elastic Runtime

  1. Perform the following steps to determine which stemcell is used by Elastic Runtime:

    1. Navigate to the Ops Manager Installation Dashboard.
    2. Click the Pivotal Elastic Runtime tile.
    3. Click Stemcell and record the release number included in the displayed filename: Stemcell In the screenshot above, the stemcell release number is 3421.9.

      You can also retrieve the stemcell release using the BOSH CLI:

      $ bosh -e DIRECTOR_IP deployments
      Using environment '10.0.0.5' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)

      Name Release(s) Stemcell(s) Team(s) Cloud Config cf-9cb6995b7d746cd77438 push-apps-manager-release/661.1.24 bosh-google-kvm-ubuntu-trusty-go_agent/3421.9 - latest ...

  2. Download the stemcell from Pivotal Network.

  3. Run the following command to upload the stemcell used by Elastic Runtime:

    $ bosh -e BOSH_DIRECTOR_IP \
      -d DEPLOYMENT_NAME \
      --ca-cert PATH_TO_BOSH_SERVER_CERT \
      upload-stemcell \
      --fix PATH_TO_STEMCELL
    

  4. From the Ops Manager Installation Dashboard, navigate to Pivotal Elastic Runtime > Resource Config.

  5. Ensure the number of instances for MySQL Server is set to 1.

    Warning: Restore will fail if there is not exactly one MySQL Server instance deployed.

  6. Return to the Ops Manager Installation Dashboard and click Apply Changes to redeploy.

Step 10: Restore Elastic Runtime

  1. Run the BBR restore command from your jumpbox to restore Elastic Runtime:

    $ BOSH_CLIENT_SECRET=BOSH_PASSWORD \
      bbr deployment \
        --target BOSH_DIRECTOR_IP \
        --username BOSH_CLIENT \
        --deployment DEPLOYMENT_NAME \
        --ca-cert PATH_TO_BOSH_SERVER_CERT \
        restore \
          --artifact-path PATH_TO_ERT_BACKUP
    

    Replace the placeholder values as follows:

    • BOSH_CLIENT, BOSH_CLIENT_SECRET: Use the BOSH UAA user provided in Pivotal Ops Manager > Credentials > Uaa Bbr Client Credentials.

      You can also retrieve the credentials using the Ops Manager API with a GET request to the following endpoint: /api/v0/deployed/director/credentials/uaa_bbr_client_credentials. For more information, see the Using the Ops Manager API topic.

    • BOSH_DIRECTOR_IP: You retrieved this value in Step 5: Retrieve BOSH Director Address and Credentials.
    • DEPLOYMENT-NAME: You retrieved this value in Step 7: Identify Your Deployment.
    • PATH_TO_BOSH_SERVER_CERT: This is the path to the BOSH Director’s Certificate Authority (CA) certificate, if the certificate is not verifiable by the local machine’s certificate chain.
    • PATH_TO_ERT_BACKUP: This is the path to the Elastic Runtime backup you want to restore.
  2. If you have Container-to-Container Networking enabled in Elastic Runtime, perform the following steps after restoring Elastic Runtime:

    1. Retrieve the MySQL admin password by following one of the procedures below:
      • Log in to Ops Manager and navigate to Pivotal Elastic Runtime > Credentials > Mysql Admin Credentials.
      • Retrieve the credentials using the Ops Manager API by performing the following steps:
        1. Perform the procedures in the Using the Ops Manager API topic to authenticate and access the Ops Manager API.
        2. Use the GET /api/v0/deployed/products endpoint to retrieve a list of deployed products, replacing UAA-ACCESS-TOKEN with the access token recorded in the Using the Ops Manager API topic:
          $ curl "https://OPS-MAN-FQDN/api/v0/deployed/products" \
          -X GET \
          -H "Authorization: Bearer UAA-ACCESS-TOKEN"
        3. In the response to the above request, locate the product with an installation_name starting with cf- and copy its guid.
        4. Run the following curl command, replacing PRODUCT-GUID with the value of guid from the previous step:
          $ curl "https://OPS-MAN-FQDN/api/v0/deployed/products/PRODUCT-GUID/credentials/" \
          -X GET \
          -H "Authorization: Bearer UAA-ACCESS-TOKEN"
        5. Retrieve the MySQL admin password from the response to the above request.
    2. List the VMs in your deployment:
      $ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate \
      -d DEPLOYMENT_NAME \
      ssh
    3. Select the mysql VM to SSH into.
    4. From the mysql VM, run the following command:
      $ sudo /var/vcap/packages/mariadb/bin/mysql -u root -p
      When prompted, enter the MySQL admin password.

    5. At the MySQL prompt, run the following command:
      mysql> use silk; drop table subnets; drop table gorp_migrations;
    6. Exit MySQL:
      mysql> exit
    7. Exit the mysql VM:
      $ exit
    8. List the VMs in your deployment:
      $ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate \
      -d DEPLOYMENT_NAME \
      ssh
    9. SSH onto each diego_database VM and run the following command:
      $ sudo monit restart silk-controller

    Restored apps will begin to start. The amount of time it takes for all apps to start depends on the number of app instances, the resources available to the underlying infrastructure, and the value of the Max Inflight Container Starts field in the Elastic Runtime tile.

  3. (Optional) Scale the MySQL Server job back up to its previous number of instances by navigating to the Resource Config section of the Elastic Runtime tile. After scaling the job, return to the Ops Manager Installation Dashboard and click Apply Changes to deploy.

  4. Validate your restored PCF by performing the steps in the Step 11: (Optional) Validate Your Backup section of the Backing Up Pivotal Cloud Foundry with BBR.

Troubleshooting

This section lists common troubleshooting scenarios and their solutions.

Symptom

While running the BBR restore command, restoring the job mysql-restore fails with:

1 error occurred:

* restore script for job mysql-restore failed on mysql/0.
...
Monit start failed: Timed out waiting for monit: 2m0s

Explanation

This happens when mariadb fails to start within the timeout period. It will end up in an “Execution Failed” state and monit will never try to start it again.

Solution

To validate that mariadb is in an “Execution Failed” state, perform the following steps:

  1. List the VMs in your deployment:
    $ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate \
    -d DEPLOYMENT_NAME \
    ssh
  2. Select the mysql VM to SSH into.
  3. From the mysql VM, run the following command to check that the mariadb process is running:
    $ ps aux | grep mariadb
    
  4. Run the following command to check that monit reports mariadb_ctrl is not running:
    $ sudo monit summary
  5. After validating that mariadb is in an “Execution Failed” state, run the following command from the mysql VM to disable monitoring:
    $ monit unmonitor
  6. Run the following command to enable monitoring:
    $ monit monitor
  7. After a few minutes, run the following command:
    $ monit summary
    The command should report that all the processes are running.
  8. Re-attempt the restore with BBR.

Remove Unused Disks

If bosh cck does not clean up all disk references, you must manually delete the disks from a previous deployment that will prevent recreated deployments from working.

Warning: This is a very destructive operation.



To delete the disks, perform one of the following procedures:

  • Use the BOSH CLI to delete the disks by performing the following steps:
    1. Target the redeployed BOSH Director using the BOSH CLI by performing the procedures in Step 5: Retrieve BOSH Director Address and Credentials.
    2. List the deployments by running the following command:
      $ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate deployments
      
    3. Delete each deployment with the following command:
      $ bosh -d DEPLOYMENT_NAME delete-deployment
      
  • Log in to your IaaS account and delete the disks manually. Run the following command to retrieve a list of disk IDs:
    $ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate instances
    

Once the disks are deleted, continue with Step 8: Remove Stale Cloud IDs for All Deployments.

Create a pull request or raise an issue on the source for this page in GitHub