Rolling Back ERT Deployment to an Earlier Backup with BBR

Page last updated:

This topic describes the procedure for restoring an Elastic Runtime (ERT) deployment to a backup from a previous version with BOSH Backup and Restore (BBR), a command-line tool for backing up and restoring BOSH deployments. To perform the procedures in this topic, you must have backed up Pivotal Cloud Foundry (PCF) by following the steps in the Backing Up Pivotal Cloud Foundry with BBR topic.

To view the BBR release notes, see the BOSH Backup and Restore Release Notes. To restore PCF manually, see the Restoring Pivotal Cloud Foundry Manually from Backup topic.

The procedures described in this topic prepare your environment for PCF, deploy Ops Manager, import your installation settings, and use BBR to restore your PCF components.

Warning: Only follow the procedure on this page if you are recovering from a catastrophically failed ERT upgrade. These procedures will incur downtime and are destructive.

Note: BBR is a feature in PCF v1.11. You can only use BBR to back up PCF v1.11 and later. To restore earlier versions of PCF, perform the manual procedures.

Step 1: Delete Existing ERT Deployment

  1. Navigate to the Ops Manager Installation Dashboard.

  2. Delete the Pivotal Elastic Runtime tile by clicking the trash icon on the tile.

  3. Click Apply Changes.

Step 2: Redeploy Ops Manager and Import Installation Settings

  1. Log in to your IaaS console and destroy your Ops Manager VM.

  2. Perform the procedures for your IaaS to re-deploy Ops Manager:

  3. Access your new Ops Manager by navigating to YOUR-OPS-MAN-FQDN in a browser.

  4. On the Welcome to Ops Manager page, click Import Existing Installation.

    Welcome

  5. In the import panel, perform the following steps:

    • Enter your Decryption Passphrase.
    • Click Choose File and browse to the installation zip file that you exported in the Step 7: Export Installation Settings section of the Backing Up Pivotal Cloud Foundry with BBR topic.

    Decryption passphrase

  6. Click Import.

    Note: Some browsers do not provide feedback on the status of the import process, and may appear to hang.

  7. A Successfully imported installation message appears upon completion.

    Success

Step 3: Redeploy BOSH Director

Perform the steps in the Applying Changes to Ops Manager Director topic to use the Ops Manager API to only deploy the BOSH Director.

Note: This procedure will not perform a redeploy because the BOSH Director still exists. It is only required to recreate the root_ca_certificate file on the Ops Manager VM after it is restored.

Step 4: Transfer Artifacts to Jumpbox

In the Step 9: Back Up Your Elastic Runtime Deployment section of the Backing Up Pivotal Cloud Foundry with BBR topic, you moved the TAR and metadata files of the backup artifact off your jumpbox to your preferred storage space. Now you must transfer those files back to your jumpbox.

For instance, you could SCP the backup artifact to your jumpbox:

$ scp LOCAL_PATH_TO_BACKUP_ARTIFACT JUMPBOX_USER/JUMPBOX_ADDRESS

Step 5: Retrieve BOSH Director Address and Credentials

  1. From the Installation Dashboard in Ops Manager, select Ops Manager Director > Status and record the IP address listed for the Director. You access the BOSH Director using this IP address.

  2. Click Credentials, and record the Director Credentials and the Bbr Ssh Credentials, including the private key beginning with "-----BEGIN RSA PRIVATE KEY-----".

Step 6: Restore the BOSH Director

  1. SSH into your jumpbox.
  2. Run the following command to reformat the private key you retrieved in Step 5: Retrieve BOSH Director and save it to a file called PRIVATE_KEY in the current directory, pasting in the contents of your private key for YOUR_PRIVATE_KEY:
    $ printf -- "YOUR_PRIVATE_KEY" > PRIVATE_KEY
    
  3. Ensure the BOSH Director backup artifact is in the folder you will run BBR from.
  4. Run the BBR restore command from your jumpbox to restore the BOSH Director:
    $ nohup bbr director \
      --private-key-path PRIVATE_KEY \
      --username bbr \
      --host HOST \
      restore \
        --artifact-path PATH_TO_DIRECTOR_BACKUP
    
    Use the optional --debug flag to enable debug logs. See Logging for more information.

    Replace the placeholder values as follows:
    • PATH_TO_DIRECTOR_BACKUP: This is the path to the Director backup you want to restore.
    • PRIVATE_KEY: This is the path to the private key file you created above.
    • HOST: This is the address of the BOSH Director. If the BOSH Director is public, this will be a URL, such as https://my-bosh.xxx.cf-app.com. Otherwise, it will be the BOSH_DIRECTOR_IP, which you retrieved in Step 5: Retrieve BOSH Director Address and Credentials.

Note: The BBR restore command can take a long time to complete. Pivotal recommends you run it independently of the SSH session, so that the process can continue running even if your connection to the jumpbox fails. The command above uses nohup but you could also run the command in a screen or tmux session.

If the command fails, do the following:

  1. Ensure all the parameters in the command are set.
  2. Ensure the BOSH Director credentials are valid.
  3. Ensure the specified deployment exists.
  4. Ensure the source deployment is compatible with the target deployment.
  5. Ensure that the jumpbox can reach the BOSH Director.

Step 7: Identify Your Deployment

After logging in to your BOSH Director with the credentials retrieved in Step 5: Retrieve BOSH Director Address and Credentials, run bosh deployments to identify the name of the BOSH deployment that contains PCF:

$ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate deployments

Name                     Release(s)
cf-example               push-apps-manager-release/661.1.24
                         cf-backup-and-restore/0.0.1
                         binary-buildpack/1.0.11
                         capi/1.28.0
                         cf-autoscaling/91
                         cf-mysql/35
                         ...

In the above example, the name of the BOSH deployment that contains PCF is cf-example.

Step 8: Remove Stale Cloud IDs for All Deployments

For every deployment in the BOSH Director, run the following command:

$ bosh -e DIRECTOR_IP -d DEPLOYMENT_NAME -n cck \
  --resolution delete_disk_reference \
  --resolution delete_vm_reference

This reconciles the BOSH Director’s internal state with the state in the IaaS. You can use the list of deployments returned in Step 7: Identify Your Deployment.

If the bosh cck command does not successfully delete disk references and you see a message similar to the following, perform the additional procedures in Remove Unused Disks below.

Scanning 19 persistent disks: 19 OK, 0 missing ...

Step 9: Redeploy Elastic Runtime

  1. Perform the following steps to determine which stemcell is used by Elastic Runtime:

    1. Navigate to the Ops Manager Installation Dashboard.
    2. Click the Pivotal Elastic Runtime tile.
    3. Click Stemcell and record the release number included in the displayed filename: Stemcell In the screenshot above, the stemcell release number is 3421.9.

      You can also retrieve the stemcell release using the BOSH CLI:

      $ bosh -e DIRECTOR_IP deployments
      Using environment '10.0.0.5' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)

      Name Release(s) Stemcell(s) Team(s) Cloud Config cf-9cb6995b7d746cd77438 push-apps-manager-release/661.1.24 bosh-google-kvm-ubuntu-trusty-go_agent/3421.9 - latest ...

  2. Download the stemcell from Pivotal Network.

  3. Run the following command to upload the stemcell used by Elastic Runtime:

    $ bosh -e BOSH_DIRECTOR_IP \
      -d DEPLOYMENT_NAME \
      --ca-cert PATH_TO_BOSH_SERVER_CERT \
      upload-stemcell \
      --fix PATH_TO_STEMCELL
    

  4. From the Ops Manager Installation Dashboard, navigate to Pivotal Elastic Runtime > Resource Config.

  5. Ensure the number of instances for MySQL Server is set to 1.

    Warning: Restore will fail if there is not exactly one MySQL Server instance deployed.

  6. Return to the Ops Manager Installation Dashboard and click Apply Changes to redeploy.

Step 10: Restore Elastic Runtime

  1. Run the BBR restore command from your jumpbox to restore Elastic Runtime:

    $ BOSH_CLIENT_SECRET=BOSH_PASSWORD \
      bbr deployment \
        --target BOSH_DIRECTOR_IP \
        --username BOSH_CLIENT \
        --deployment DEPLOYMENT_NAME \
        --ca-cert PATH_TO_BOSH_SERVER_CERT \
        restore \
          --artifact-path PATH_TO_ERT_BACKUP
    

    Replace the placeholder values as follows:

    • BOSH_CLIENT, BOSH_PASSWORD: Use the BOSH UAA user provided in Pivotal Ops Manager > Credentials > Uaa Bbr Client Credentials.

      You can also retrieve the credentials using the Ops Manager API with a GET request to the following endpoint: /api/v0/deployed/director/credentials/uaa_bbr_client_credentials. For more information, see the Using the Ops Manager API topic.

    • BOSH_DIRECTOR_IP: You retrieved this value in Step 5: Retrieve BOSH Director Address and Credentials.
    • DEPLOYMENT-NAME: You retrieved this value in Step 7: Identify Your Deployment.
    • PATH_TO_BOSH_SERVER_CERT: This is the path to the BOSH Director’s Certificate Authority (CA) certificate, if the certificate is not verifiable by the local machine’s certificate chain.
    • PATH_TO_ERT_BACKUP: This is the path to the Elastic Runtime backup you want to restore.
  2. Perform the following steps after restoring Elastic Runtime:

    1. Retrieve the MySQL admin password by following one of the procedures below:
      • Log in to Ops Manager and navigate to Pivotal Elastic Runtime > Credentials > Mysql Admin Credentials.
      • Retrieve the credentials using the Ops Manager API by performing the following steps:
        1. Perform the procedures in the Using the Ops Manager API topic to authenticate and access the Ops Manager API.
        2. Use the GET /api/v0/deployed/products endpoint to retrieve a list of deployed products, replacing UAA-ACCESS-TOKEN with the access token recorded in the Using the Ops Manager API topic:
          $ curl "https://OPS-MAN-FQDN/api/v0/deployed/products" \
          -X GET \
          -H "Authorization: Bearer UAA-ACCESS-TOKEN"
        3. In the response to the above request, locate the product with an installation_name starting with cf- and copy its guid.
        4. Run the following curl command, replacing PRODUCT-GUID with the value of guid from the previous step:
          $ curl "https://OPS-MAN-FQDN/api/v0/deployed/products/PRODUCT-GUID/credentials/" \
          -X GET \
          -H "Authorization: Bearer UAA-ACCESS-TOKEN"
        5. Retrieve the MySQL admin password from the response to the above request.
    2. List the VMs in your deployment:
      $ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate \
      -d DEPLOYMENT_NAME \
      ssh
    3. Select the mysql VM to SSH into.
    4. From the mysql VM, run the following command:
      $ sudo /var/vcap/packages/mariadb/bin/mysql -u root -p
      When prompted, enter the MySQL admin password.

    5. At the MySQL prompt, run the following command:
      mysql> use silk; drop table subnets; drop table gorp_migrations;
    6. Exit MySQL:
      mysql> exit
    7. Exit the mysql VM:
      $ exit
    8. List the VMs in your deployment:
      $ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate \
      -d DEPLOYMENT_NAME \
      ssh
    9. SSH onto each diego_database VM and run the following command:
      $ sudo monit restart silk-controller

    Restored apps will begin to start. The amount of time it takes for all apps to start depends on the number of app instances, the resources available to the underlying infrastructure, and the value of the Max Inflight Container Starts field in the Elastic Runtime tile.

  3. (Optional) Scale the MySQL Server job back up to its previous number of instances by navigating to the Resource Config section of the Elastic Runtime tile. After scaling the job, return to the Ops Manager Installation Dashboard and click Apply Changes to deploy.

  4. Validate your restored PCF by performing the steps in the Step 10: (Optional) Validate Your Backup section of the Backing Up Pivotal Cloud Foundry with BBR.

Remove Unused Disks

If bosh cck does not clean up all disk references, you must manually delete the disks from a previous deployment that will prevent recreated deployments from working.

Warning: This is a very destructive operation.



To delete the disks, perform one of the following procedures:

  • Use the BOSH CLI to delete the disks by performing the following steps:
    1. Target the redeployed BOSH Director using the BOSH CLI by performing the procedures in Step 5: Retrieve BOSH Director Address and Credentials.
    2. List the deployments by running the following command:
      $ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate deployments
      
    3. Delete each deployment with the following command:
      $ bosh -d DEPLOYMENT_NAME delete-deployment
      
  • Log in to your IaaS account and delete the disks manually. Run the following command to retrieve a list of disk IDs:
    $ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate instances
    

Once the disks are deleted, continue with Step 8: Remove Stale Cloud IDs for All Deployments.

Create a pull request or raise an issue on the source for this page in GitHub