Restoring Pivotal Cloud Foundry from Backup with BBR

Page last updated:

This topic describes the procedure for restoring your critical backend PCF components with BOSH Backup and Restore (BBR), a command-line tool for backing up and restoring BOSH deployments. To perform the procedures in this topic, you must have backed up Pivotal Cloud Foundry (PCF) by following the steps in the Backing Up Pivotal Cloud Foundry with BBR topic.

To view the BBR release notes, see the BOSH Backup and Restore Release Notes. To restore PCF manually, see the Restoring Pivotal Cloud Foundry Manually from Backup topic.

The procedures described in this topic prepare your environment for PCF, deploy Ops Manager, import your installation settings, and use BBR to restore your PCF components.

Warning: Restoring Pivotal Cloud Foundry (PCF) with BBR is a destructive operation. If the restore fails, the new environment may be left in an unusable state and require reprovisioning. Only perform the procedures in this topic for the purpose of disaster recovery, such as recreating PCF after a storage-area network (SAN) corruption.

Warning: When validating your backup, the VMs and disks from the backed up BOSH Director should not visible to the new BOSH Director. As a result, Pivotal recommends that you deploy the new BOSH Director to a different IaaS network and account than the VMs and disks of the backed up BOSH Director.

Note: BBR is a feature in PCF v1.11. You can only use BBR to back up PCF v1.11 and later. To restore earlier versions of PCF, perform the manual procedures.

(Optional) Step 1: Prepare Your Environment

In an event of a disaster, you may lose not only your VMs and disks, but your IaaS resources as well, such as networks and load balancers.

If you need to recreate your IaaS resources, prepare your environment for PCF by following the instructions specific to your IaaS in Installing Pivotal Cloud Foundry.

Note: The instructions for installing PCF on Amazon Web Services (AWS) and OpenStack combine the procedures for preparing your environment and deploying Ops Manager into a single topic. The instructions for the other supported IaaSes split these procedures into two separate topics.

If you recreate your IaaS resources, you must also add those resources to Ops Manager by performing the procedures in the (Optional) Step 11: Configure Ops Manager for New Resources section.

Step 2: Deploy Ops Manager and Import Installation Settings

  1. Perform the procedures for your IaaS to deploy Ops Manager:

  2. Access your new Ops Manager by navigating to YOUR-OPS-MAN-FQDN in a browser.

  3. On the Welcome to Ops Manager page, click Import Existing Installation.

    Welcome

  4. In the import panel, perform the following tasks:

    • Enter your Decryption Passphrase.
    • Click Choose File and browse to the installation zip file that you exported in the Step 3: Export Installation Settings section of the Backing Up Pivotal Cloud Foundry with BBR topic.

    Decryption passphrase

  5. Click Import.

    Note: Some browsers do not provide feedback on the status of the import process, and may appear to hang.

  6. A Successfully imported installation message appears upon completion.

    Success

  7. If you are restoring to a different PCF, i.e. if you had to recreate the networks, load balances, etc, see Step 11.

Step 3: Remove BOSH State File

  1. SSH into your Ops Manager VM. For more information, see the SSH into Ops Manager section of the Advanced Troubleshooting with the BOSH CLI topic.
  2. On the Ops Manager VM, delete the /var/tempest/workspaces/default/deployments/bosh-state.json file:
    $ rm /var/tempest/workspaces/default/deployments/bosh-state.json
    
  3. Navigate to YOUR-OPS-MAN-FQDN in a browser and log into Ops Manager.
  4. For each tile that requires one, upload the required stemcell.
  5. Do not click Apply Changes. Instead, perform the steps in the Applying Changes to Ops Manager Director topic to use the Ops Manager API to only deploy the Ops Manager Director.

Step 4: Transfer Artifacts to Jumpbox

In the Step 9: Back Up Your Deployment section of the Backing Up Pivotal Cloud Foundry with BBR topic, you moved the TAR and metadata files of the backup artifact off your jumpbox to your preferred storage space. Now you must transfer those files back to your jumpbox.

For instance, you could SCP the backup artifact to your jumpbox:

$ scp LOCAL_PATH_TO_BACKUP_ARTIFACT JUMPBOX_USER/JUMPBOX_ADDRESS

Step 5: Retrieve BOSH Director Address and Credentials

You can retrieve the IP address of your BOSH Director and the credentials for logging in either from the Ops Manager Director tile or by using the Ops Manager API.

Retrieve the Information from Ops Manager Director

Perform the following steps to retrieve the IP address of your BOSH Director and the credentials for logging in from the Ops Manager Director tile:

  1. Install the BOSH v2 CLI on a machine outside of your PCF deployment.
  2. From the Installation Dashboard in Ops Manager, select Ops Manager Director > Status and record the IP address listed for the Director. You access the BOSH Director using this IP address.

  3. Click Credentials and record the Director credentials.
  4. From the command line, log into the BOSH Director using the IP address and credentials that you recorded:
    $ bosh -e DIRECTOR_IP \
    --ca-cert /var/tempest/workspaces/default/root_ca_certificate log-in
    Email (): director
    Password (): *******************
    Successfully authenticated with UAA
    Succeeded
    

Retrieve the Information from the Ops Manager API

Perform the following steps to retrieve the IP address of your BOSH Director and the credentials for logging in from the Ops Manager API:

  1. Install the BOSH v2 CLI on a machine outside of your PCF deployment.
  2. Perform the procedures in the Using the Ops Manager API topic to authenticate and access the Ops Manager API.
  3. Use the GET /api/v0/deployed/products endpoint to retrieve a list of deployed products, replacing UAA-ACCESS-TOKEN with the access token recorded in the Using the Ops Manager API topic:
    $ curl "https://OPS-MAN-FQDN/api/v0/deployed/products" \ 
    -X GET \ 
    -H "Authorization: Bearer UAA-ACCESS-TOKEN"
  4. In the response to the above request, locate the product with an installation_name starting with p-bosh- and copy its guid.
  5. Run the following curl command, replacing PRODUCT-GUID with the value of guid from the previous step:
    $ curl "https://OPS-MAN-FQDN/api/v0/deployed/products/PRODUCT-GUID/static_ips" \ 
    -X GET \
    -H "Authorization: Bearer UAA-ACCESS-TOKEN"
  6. In the response to the above request, locate the BOSH Director IP address under the ips field.
  7. Run the following curl command to retrieve the BOSH Director credentials:
    $ curl "https://OPS-MAN-FQDN/api/v0/deployed/director/credentials/director_credentials" \ 
    -X GET \
    -H "Authorization: Bearer UAA-ACCESS-TOKEN"
  8. From the command line, log into the BOSH Director using the IP address and credentials that you recorded:
    $ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate log-in
    Email (): director
    Password (): *******************
    Successfully authenticated with UAA
    Succeeded
    

Step 6: Restore the BOSH Director

  1. Navigate to the Ops Manager Installation Dashboard.
  2. Click the Ops Manager tile.
  3. Click the Credentials tab.
  4. Locate Bbr Ssh Credentials and click Link to Credential next to it.

    You can also retrieve the credentials using the Ops Manager API with a GET request to the following endpoint: /api/v0/deployed/director/credentials/bbr_ssh_credentials. For more information, see the Using the Ops Manager API topic.

  5. Copy the value for private_key_pem, beginning with "-----BEGIN RSA PRIVATE KEY-----".

  6. SSH into your jumpbox.

  7. Run the following command to reformat the key and save it to a file called PRIVATE_KEY in the current directory, pasting in the contents of your private key for YOUR_PRIVATE_KEY:

    $ printf -- "YOUR_PRIVATE_KEY" > PRIVATE_KEY
    

  8. Ensure the BOSH Director backup artifact is in the folder you will run BBR from.

  9. Run the BBR restore command from your jumpbox to restore the BOSH Director:

    $ nohup bbr director \
      --private-key-path PRIVATE_KEY \
      --username bbr \
      --host HOST \
      restore \
        --artifact-path PATH_TO_DIRECTOR_BACKUP
    
    Use the optional --debug flag to enable debug logs. See the Logging section of the Backing Up Pivotal Cloud Foundry with BBR topic for more information.

    Replace the placeholder values as follows:

    • PATH_TO_DIRECTOR_BACKUP: This is the path to the Director backup you want to restore.
    • PRIVATE_KEY: This is the path to the private key file you created above.
    • HOST: This is the address of the BOSH Director. If the BOSH Director is public, this will be a URL, such as https://my-bosh.xxx.cf-app.com. Otherwise, it will be the BOSH_DIRECTOR_IP, which you retrieved in the Step 5: Retrieve BOSH Director Address and Credentials section.

Note: The BBR restore command can take a long time to complete. Pivotal recommends you run it independently of the SSH session, so that the process can continue running even if your connection to the jumpbox fails. The command above uses nohup but you could also run the command in a screen or tmux session.

If the commands completes successfully, continue to Step 8: Remove Stale Cloud IDs for All Deployments.

If the command fails, do the following:

  1. Ensure all the parameters in the command are set.
  2. Ensure the BOSH Director credentials are valid.
  3. Ensure the specified deployment exists.
  4. Ensure the source deployment is compatible with the target deployment.
  5. Ensure that the jumpbox can reach the BOSH Director.

Step 7: Identify Your Deployment

After logging in to your BOSH Director, run bosh deployments to identify the name of the BOSH deployment that contains PCF:

$ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate deployments

Name                     Release(s)
cf-example               push-apps-manager-release/661.1.24
                         cf-backup-and-restore/0.0.1
                         binary-buildpack/1.0.11
                         capi/1.28.0
                         cf-autoscaling/91
                         cf-mysql/35
                         ...

In the above example, the name of the BOSH deployment that contains PCF is cf-example.

Step 8: Remove Stale Cloud IDs for All Deployments

For every deployment in the BOSH Director, run the following command:

$ bosh -e DIRECTOR_IP -d DEPLOYMENT_NAME -n cck \
  --resolution delete_disk_reference \
  --resolution delete_vm_reference

This reconciles the BOSH Director’s internal state with the state in the IaaS. You can use the list of deployments returned in Step 7: Identify Your Deployment.

If the bosh cck command does not successfully delete disk references and you see a message similar to the following, perform the additional procedures in the Remove Unused Disks section below.

Scanning 19 persistent disks: 19 OK, 0 missing ...

Step 9: Redeploy Elastic Runtime

  1. Perform the following steps to determine which stemcell is used by Elastic Runtime:

    1. Navigate to the Ops Manager Installation Dashboard.
    2. Click the Pivotal Elastic Runtime tile.
    3. Click Stemcell and record the release number included in the displayed filename: Stemcell In the screenshot above, the stemcell release number is 3421.9.

      You can also retrieve the stemcell release using the BOSH CLI:

      $ bosh -e DIRECTOR_IP deployments
      Using environment '10.0.0.5' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)

      Name Release(s) Stemcell(s) Team(s) Cloud Config cf-9cb6995b7d746cd77438 push-apps-manager-release/661.1.24 bosh-google-kvm-ubuntu-trusty-go_agent/3421.9 - latest ...

  2. Download the stemcell from Pivotal Network.

  3. Run the following command to upload the stemcell used by Elastic Runtime:

    $ bosh -e BOSH_DIRECTOR_IP \
      -d DEPLOYMENT_NAME \
      --ca-cert PATH_TO_BOSH_SERVER_CERT \
      upload-stemcell \
      --fix PATH_TO_STEMCELL
    

  4. From the Ops Manager Installation Dashboard, navigate to Pivotal Elastic Runtime > Resource Config.

  5. Ensure the number of instances for MySQL Server is set to 1.

    Note: Restore will fail if there is not exactly one MySQL Server instance deployed.

  6. Return to the Ops Manager Installation Dashboard and click Apply Changes to redeploy.

Step 10: Restore Elastic Runtime

  1. Run the BBR restore command from your jumpbox to restore Elastic Runtime:

    $ BOSH_CLIENT_SECRET=BOSH_PASSWORD \
      bbr deployment \
        --target BOSH_DIRECTOR_IP \
        --username BOSH_CLIENT \
        --deployment DEPLOYMENT_NAME \
        --ca-cert PATH_TO_BOSH_SERVER_CERT \
        restore \
          --artifact-path PATH_TO_ERT_BACKUP
    

    Replace the placeholder values as follows:

    • BOSH_CLIENT, BOSH_CLIENT_SECRET: Use the BOSH UAA user provided in Pivotal Ops Manager > Credentials > Uaa Bbr Client Credentials.

      You can also retrieve the credentials using the Ops Manager API with a GET request to the following endpoint: /api/v0/deployed/director/credentials/uaa_bbr_client_credentials. For more information, see the Using the Ops Manager API topic.

    • BOSH_DIRECTOR_IP: You retrieved this value in the Step 5: Retrieve BOSH Director Address and Credentials section.
    • DEPLOYMENT-NAME: You retrieved this value in the Step 7: Identify Your Deployment section.
    • PATH_TO_BOSH_SERVER_CERT: This is the path to the BOSH Director’s Certificate Authority (CA) certificate, if the certificate is not verifiable by the local machine’s certificate chain.
    • PATH_TO_ERT_BACKUP: This is the path to the Elastic Runtime backup you want to restore.
  2. If you have Container-to-Container Networking enabled in Elastic Runtime, perform the following steps after restoring Elastic Runtime:

    1. Retrieve the MySQL admin password by following one of the procedures below:
      • Log in to Ops Manager and navigate to Pivotal Elastic Runtime > Credentials > Mysql Admin Credentials.
      • Retrieve the credentials using the Ops Manager API by performing the following steps:
        1. Perform the procedures in the Using the Ops Manager API topic to authenticate and access the Ops Manager API.
        2. Use the GET /api/v0/deployed/products endpoint to retrieve a list of deployed products, replacing UAA-ACCESS-TOKEN with the access token recorded in the Using the Ops Manager API topic:
          $ curl "https://OPS-MAN-FQDN/api/v0/deployed/products" \ 
          -X GET \ 
          -H "Authorization: Bearer UAA-ACCESS-TOKEN"
        3. In the response to the above request, locate the product with an installation_name starting with cf- and copy its guid.
        4. Run the following curl command, replacing PRODUCT-GUID with the value of guid from the previous step:
          $ curl "https://OPS-MAN-FQDN/api/v0/deployed/products/PRODUCT-GUID/credentials/" \ 
          -X GET \ 
          -H "Authorization: Bearer UAA-ACCESS-TOKEN"
        5. Retrieve the MySQL admin password from the response to the above request.
    2. List the VMs in your deployment:
      $ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate \
      -d DEPLOYMENT_NAME \
      ssh
    3. Select the mysql VM to SSH into.
    4. From the mysql VM, run the following command:
      $ sudo /var/vcap/packages/mariadb/bin/mysql -u root -p
      When prompted, enter the MySQL admin password.

    5. At the MySQL prompt, run the following command:
      mysql> use silk; drop table subnets; drop table gorp_migrations;
    6. Exit MySQL:
      mysql> exit
    7. Exit the mysql VM:
      $ exit
    8. List the VMs in your deployment:
      $ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate \
      -d DEPLOYMENT_NAME \
      ssh
    9. SSH onto each diego_database VM and run the following command:
      $ sudo monit restart silk-controller

    Restored apps will begin to start. The amount of time it takes for all apps to start depends on the number of app instances, the resources available to the underlying infrastructure, and the value of the Max Inflight Container Starts field in the Elastic Runtime tile.

  3. If desired, scale the MySQL Server job back up to its previous number of instances by navigating to the Resource Config section of the Elastic Runtime tile. After scaling the job, return to the Ops Manager Installation Dashboard and click Apply Changes to deploy.

  4. Validate your restored PCF by performing the steps in the Step 11: (Optional) Validate Your Backup section of the Backing Up Pivotal Cloud Foundry with BBR.

(Optional) Step 11: Configure Ops Manager for New Resources

If you recreated your IaaS resources by following the steps in the (Optional) Step 1: Prepare Your Environment section above, perform the following steps to update Ops Manager with your new resources:

  1. Navigate to the Ops Manager Installation Dashboard and click the Ops Manager Director tile.
  2. Click Create Networks and update the network names to reflect the network names for the new environment.
  3. Return to the Ops Manager Installation Dashboard and click the Elastic Runtime tile.
  4. Click Resource Config. If necessary for your IaaS, enter the name of your new load balancer in the Load Balancer column.
  5. If necessary, click Networking and update the load balancer SSL certificate and private key under Router SSL Termination Certificate and Private Key.
  6. If your environment has a new DNS address, update the old environment DNS entries to point to the new load balancer addresses. For more information, see the Step 4: Configure Networking section of the Using Your Own Load Balancer topic and follow the link to the instructions for your IaaS.
  7. If you are using Google Cloud Platform (GCP), navigate to the Google Config section of the Ops Manager Director tile and update the Default Deployment Tag to reflect the new environment.

(Optional) Step 12: Restore On-Demand Service Instances

Note: These procedures restore the on-demand service instances but do not restore service instance data.

If you have on-demand service instances provisioned by an on-demand service broker, perform the following steps to restore them after successfully restoring PCF:

  1. Use the Cloud Foundry Command Line Interface (cf CLI) to target your PCF deployment:
    $ cf api api.YOUR-SYSTEM-DOMAIN
    
  2. Log in:
    $ cf login
    
  3. Perform the following steps to make a list of all the service instances provisioned by your on-demand service broker:
    1. List your service offerings:
      $ cf curl /v2/services
      
    2. Record the GUID of the on-demand service offering you want to restore by examining the value for guid under metadata:
      "metadata": {
      "guid": "ab2b01cc-2a22-525a-a333-e6e666a6aa66",
      "url": "/v2/services/ab2b01cc-2a22-525a-a333-e6e666a6aa66",
      "created_at": "2017-02-10T18:19:35Z",
      "updated_at": "2017-02-10T18:19:35Z"
      
    3. List all service plans for the service offering, replacing SERVICE-OFFERING-GUID with the GUID obtained in the previous step:
      $ cf curl /v2/services/SERVICE-OFFERING-GUID/service_plans
      
    4. Record the GUID of each service plan by examining the value for guid under metadata.
    5. For each service plan, list all service instances:
      $ cf curl /v2/service_plans/SERVICE-PLAN-GUID/service_instances
      
    6. Record the GUID of each service instance by examining the value for guid under metadata.
  4. Perform the following steps to obtain the BOSH credentials used by your on-demand service broker:
    1. Navigate to https://YOUR-OPS-MAN-FQDN/api/v0/staged/products in a browser to obtain the product GUID of your tile.
    2. Navigate to https://YOUR-OPS-MAN-FQDN/api/v0/staged/products/PRODUCT-GUID/manifest to obtain your product’s staged manifest.
    3. Copy the manifest into a file on your local machine called manifest.json.
    4. Run the following command to extract the BOSH credentials:
      $ cat manifest.json | jq '(.manifest.instance_groups[] |
      select(.name == "redis-on-demand-broker").jobs[] | 
      select(.name == "broker").properties.bosh.authentication.uaa )'
      
  5. SSH into your Ops Manager VM. For more information, see the SSH into Ops Manager section of the Advanced Troubleshooting with the BOSH CLI topic.
  6. Using the BOSH credentials retrieved above, authenticate with your BOSH Director by running the following commands with the BOSH CLI v2:
    $ export BOSH_CLIENT=YOUR-CLIENT-ID
    $ export BOSH_CLIENT_SECRET=YOUR-CLIENT-SECRET
    $ bosh alias-env opsmanager -e YOUR-OPS-MAN-FQDN \
    --ca-cert /var/tempest/workspaces/default/root_ca_certificate
    
  7. Using the list of service instance GUIDs gathered above, deploy each instance with the following commands:
    $ bosh -e opsmanager manifest \
    -d service-instance_SERVICE-INSTANCE-GUID > /tmp/manifest.yml
    $ bosh -e opsmanager \
    -d service-instance_SERVICE-INSTANCE-GUID deploy /tmp/manifest.yml
    
  8. After deploying all service instances, remove the manifest from tmp.
    $ rm /tmp/manifest.yml
    

Rolling Back ERT in the Event of a Failed Upgrade

If you have previously backed up PCF using BOSH Backup and Restore, you can roll back Elastic Runtime (ERT) to an earlier deployment if the ERT upgrade fails. For instructions about rolling back the ERT deployment, see Rolling Back ERT Deployment to an Earlier Backup with BBR.

Troubleshooting

This section lists common troubleshooting scenarios and their solutions.

Symptom

While running the BBR restore command, restoring the job mysql-restore fails with:

1 error occurred:

* restore script for job mysql-restore failed on mysql/0.
...
Monit start failed: Timed out waiting for monit: 2m0s

Explanation

This happens when mariadb fails to start within the timeout period. It will end up in an “Execution Failed” state and monit will never try to start it again.

Solution

To validate that mariadb is in an “Execution Failed” state, perform the following steps:

  1. List the VMs in your deployment:
    $ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate \
    -d DEPLOYMENT_NAME \
    ssh
  2. Select the mysql VM to SSH into.
  3. From the mysql VM, run the following command to check that the mariadb process is running:
    $ ps aux | grep mariadb
    
  4. Run the following command to check that monit reports mariadb_ctrl is not running:
    $ sudo monit summary
  5. After validating that mariadb is in an “Execution Failed” state, run the following command from the mysql VM to disable monitoring:
    $ monit unmonitor
  6. Run the following command to enable monitoring:
    $ monit monitor
  7. After a few minutes, run the following command:
    $ monit summary
    The command should report that all the processes are running.
  8. Re-attempt the restore with BBR.

Remove Unused Disks

If bosh cck does not clean up all disk references, you must manually delete the disks from a previous deployment that will prevent recreated deployments from working.

Warning: This is a very destructive operation.



To delete the disks, perform one of the following procedures:

  • Use the BOSH CLI to delete the disks by performing the following steps:
    1. Target the redeployed BOSH Director using the BOSH CLI by performing the procedures in Step 4: Retrieve BOSH Director Address and Credentials.
    2. List the deployments by running the following command:
      $ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate deployments
      
    3. Delete each deployment with the following command:
      $ bosh -d DEPLOYMENT_NAME delete-deployment
      
  • Log in to your IaaS account and delete the disks manually. Run the following command to retrieve a list of disk IDs:
    $ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate instances
    

Once the disks are deleted, continue with Step 8: Remove Stale Cloud IDs for All Deployments.

Create a pull request or raise an issue on the source for this page in GitHub