Backing Up Pivotal Cloud Foundry with BBR

Page last updated:

This topic describes the procedure for backing up your critical backend Pivotal Cloud Foundry (PCF) components with BOSH Backup and Restore (BBR), a command-line tool for backing up and restoring BOSH deployments. To restore your backup, see the Restoring Pivotal Cloud Foundry from Backup with BBR topic.

To view the BBR release notes, see the BOSH Backup and Restore Release Notes. To back up PCF manually, see the Backing Up Pivotal Cloud Foundry Manually topic.

Note: You can only use BBR to back up PCF v1.11 and later. To back up earlier versions of PCF, perform the manual procedures.

Pivotal recommends backing up your installation settings frequently, especially before making any changes to your PCF deployment, such as configuration of any tiles in Ops Manager.

During the backup, BBR stops the Cloud Controller API and the Cloud Controller workers to create a consistent backup. Only the API functionality, like pushing applications or using the Cloud Foundry Command Line Interface (cf CLI) are affected. The deployed applications do not experience downtime.

Supported Components

BBR is a binary that can back up and restore BOSH deployments and BOSH Directors. BBR requires that the backup targets supply scripts that implement the backup and restore functions.

BBR backs up the following PCF components:

  • Elastic Runtime: Elastic Runtime must be configured with an internal MySQL database and a WebDAV/NFS blobstore to be backed up and restored with BBR.
  • BOSH Director: The BOSH Director must have an internal Postgres database to be backed up and restored with BBR. As part of backing up the BOSH Director, BBR backs up the BOSH UAA database and the CredHub database.

Service tiles have different levels of integration. BBR may or may not be able to back up your service tiles depending on their level of integration. Consult the following list:

  • Service brokers: You can back up and restore all brokered services with the procedures documented in this topic and in the Restoring Pivotal Cloud Foundry from Backup with BBR topic. Because a brokered service runs external to PCF, BBR backs up and restores the VMs and the service instances, but not the service data.
  • Managed services: Because managed services are BOSH releases, they must implement the BBR scripts. Otherwise, you cannot use BBR to back up and restore them. If the managed service has implemented BBR scripts, BBR backs up and restores both the VMs and the service data.
  • On-demand services: On-demand instances are redeployed, but the data in the instance is not backed up. A new, empty instance of the on-demand service is restored.

Workflow

Operators download the BBR binary and transfer it to a jumpbox. Then they run BBR from the jumpbox, specifying the name of the BOSH deployment to back up.

BBR examines the jobs in the BOSH deployment, and triggers the scripts in the following stages:

  1. Pre-backup lock: The pre-backup lock scripts locks the job so backups are consistent across the cluster.
  2. Backup: The backup script backs up the release.
  3. Post-backup unlock: The post-backup unlock script unlocks the job after the backup is complete.

Scripts in the same stage are all triggered together. For instance, BBR triggers all pre-backup lock scripts before any backup scripts. Scripts within a stage may be triggered in any order.

The backup artifacts are drained to the jumpbox, where the operator can transfer them to storage and use them to restore PCF.

The following diagram shows a sample backup flow.

Backup flow

Step 1: Record the Cloud Controller Database Encryption Credentials

You can retrieve the Cloud Controller Database encryption credentials either from the Elastic Runtime tile or by using the Ops Manager API.

Retrieve the Credentials from Elastic Runtime

Perform the following steps to retrieve the Cloud Controller Database encryption credentials from the Elastic Runtime tile:

  1. Navigate to Ops Manager in a browser and log in to the Ops Manager Installation Dashboard.
  2. Select Pivotal Elastic Runtime > Credentials and locate the Cloud Controller section.
  3. Record the Cloud Controller DB Encryption Credentials. You must provide these credentials if you contact Pivotal Support for help restoring your installation.

    Ccdb encrypt creds

Retrieve the Credentials with the Ops Manager API

Perform the following steps to retrieve the Cloud Controller Database encryption credentials with the Ops Manager API:

  1. Perform the procedures in the Using the Ops Manager API topic to authenticate and access the Ops Manager API.
  2. Use the GET /api/v0/deployed/products endpoint to retrieve a list of deployed products, replacing UAA-ACCESS-TOKEN with the access token recorded in the Using the Ops Manager API topic:
    $ curl "https://OPS-MAN-FQDN/api/v0/deployed/products" \ 
    -X GET \ 
    -H "Authorization: Bearer UAA-ACCESS-TOKEN"
  3. In the response to the above request, locate the product with an installation_name starting with cf- and copy its guid.
  4. Run the following curl command, replacing PRODUCT-GUID with the value of guid from the previous step:
    $ curl "https://OPS-MAN-FQDN/api/v0/deployed/products/:PRODUCT_GUID/credentials/.cloud_controller.db_encryption_credentials" 
    \ -X GET 
    \ -H "Authorization: Bearer UAA-ACCESS-TOKEN"

Step 2: Enable Backup Prepare Node

BBR requires the MySQL backup prepare node to be present in order to successfully backup ERT. Currently, the only way to ensure the presence of this node is to enable automated MySQL backups. If you already have these enabled, you may skip this section. In the future, there will be an option to deploy the backup prepare node without enabling automatic MySQL backups.

Perform the following steps to enable the backup prepare node:

  1. In the Elastic Runtime tile, click Internal MySQL.
  2. Under Automated Backups Configuration, select Enable automated backups from MySQL to an S3 bucket or other S3-compatible file store. Fill in the required fields with any text value, and set the Cron Schedule to a cron-formatted date that doesn’t exist. For example, use February 31: 0 0 31 2 *.

    Note: If you do not enable automated backups, the BBR backup will fail.

  3. Click Resource Config and use the dropdown menu to scale up the Backup Prepare Node to one instance. Backup prepare node
  4. Navigate back to the Ops Manager Installation Dashboard and click Apply Changes to redeploy.

Step 3: Export Installation Settings

Pivotal recommends that you back up your installation settings by exporting frequently. This option is only available after you have deployed at least one time. Always export an installation before following the steps in the Import Installation Settings section of the Restoring Pivotal Cloud Foundry from Backup with BBR topic.

Note: Exporting your installation only backs up your installation settings. It does not back up your virtual machines (VMs) or any external MySQL databases.

From the Installation Dashboard in the Ops Manager interface, click your user name at the top right navigation. Select Settings.

Export Installation Settings exports the current PCF installation settings and assets. When you export an installation, the exported file contains the base VM images, all necessary packages, and references to the installation IP addresses. As a result, an exported installation file can exceed 5 GB in size.

After exporting installation settings, Ops Manager will create a number of files in the /tmp/ops_manager directory on the Ops Manager VM and schedule a cleanup in one hour. Depending on the size of the VM, it may run out of disk space if you export multiple installations before the hourly cleanup runs.

Settings

Step 4: Retrieve BOSH Director Address and Credentials

You can retrieve the IP address of your BOSH Director and the credentials for logging in either from the Ops Manager Director tile or by using the Ops Manager API.

Retrieve the Information from Ops Manager Director

Perform the following steps to retrieve the IP address of your BOSH Director and the credentials for logging in from the Ops Manager Director tile:

  1. Install the BOSH v2 CLI on a machine outside of your PCF deployment.
  2. From the Installation Dashboard in Ops Manager, select Ops Manager Director > Status and record the IP address listed for the Director. You access the BOSH Director using this IP address.

  3. Click Credentials and record the Director credentials.
  4. From the command line, log into the BOSH Director using the IP address and credentials that you recorded:
    $ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate log-in
    Email (): director
    Password (): *******************
    Successfully authenticated with UAA
    Succeeded
    

Retrieve the Information from the Ops Manager API

Perform the following steps to retrieve the IP address of your BOSH Director and the credentials for logging in from the Ops Manager API:

  1. Install the BOSH v2 CLI on a machine outside of your PCF deployment.
  2. Perform the procedures in the Using the Ops Manager API topic to authenticate and access the Ops Manager API.
  3. Use the GET /api/v0/deployed/products endpoint to retrieve a list of deployed products, replacing UAA-ACCESS-TOKEN with the access token recorded in the Using the Ops Manager API topic:
    $ curl "https://OPS-MAN-FQDN/api/v0/deployed/products" \ 
    -X GET \ 
    -H "Authorization: Bearer UAA-ACCESS-TOKEN"
  4. In the response to the above request, locate the product with an installation_name starting with p-bosh- and copy its guid.
  5. Run the following curl command, replacing PRODUCT-GUID with the value of guid from the previous step:
    $ curl "https://OPS-MAN-FQDN/api/v0/deployed/products/PRODUCT-GUID/static_ips" \ 
    -X GET \
    -H "Authorization: Bearer UAA-ACCESS-TOKEN"
  6. In the response to the above request, locate the BOSH Director IP address under the ips field.
  7. Run the following curl command to retrieve the BOSH Director credentials:
    $ curl "https://OPS-MAN-FQDN/api/v0/deployed/director/credentials/director_credentials" \ 
    -X GET \
    -H "Authorization: Bearer UAA-ACCESS-TOKEN"
  8. From the command line, log into the BOSH Director using the IP address and credentials that you recorded:
    $ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate log-in
    Email (): director
    Password (): *******************
    Successfully authenticated with UAA
    Succeeded
    

Step 5: Identify Your Deployment

After logging in to your BOSH Director, run the following command to identify the name of the BOSH deployment that contains PCF:

$ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate deployments

Name                     Release(s)
cf-example               push-apps-manager-release/661.1.24
                         cf-backup-and-restore/0.0.1
                         binary-buildpack/1.0.11
                         capi/1.28.0
                         cf-autoscaling/91
                         cf-mysql/35
                         ...

In the above example, the name of the BOSH deployment that contains PCF is cf-example.

Step 6: Set Up Your Jumpbox

Set up your jumpbox with the following settings:

  • The jumpbox must have sufficient space for the backup. A PCF backup will be at least 1.5 GB in size.
  • BBR connects to the VMs at their private IP address, so the jumpbox needs to be in the same network as the deployed VMs. BBR does not support SSH gateways.
  • BBR copies the backed-up data from the VMs to the jumpbox, so ensure you have minimal network latency between them to reduce transfer times.

Consult the following table for more information about the network access permissions required by BBR.

VM Default Port Description
BOSH Director 25555 BBR interacts with the BOSH Director API.
Deployed Instances 22 BBR uses SSH to orchestrate the backup on the instances.
BOSH Director UAA 8443 BBR interacts with the UAA API for authentication, if necessary.

Step 7: Transfer BBR to Your Jumpbox

Perform the following steps to transfer BBR to your jumpbox:

  1. Download the latest BBR release.
  2. Change the permissions of bbr in order to make it executable:
    $ chmod a+x bbr
  3. SCP the binary to your jumpbox:
    $ scp LOCAL_PATH_TO_BBR/bbr JUMPBOX_USER/JUMPBOX_ADDRESS
    If your jumpbox has access to the internet, you can also SSH into your jumpbox and use wget:
    $ ssh JUMPBOX_USER/JUMPBOX_ADDRESS -i YOUR_CERTIFICATE.pem
    $ wget BBR_RELEASE_URL
    $ chmod a+x bbr
    

Step 8: Check Your Deployment

Perform the following steps to check that your BOSH Director is reachable and has a deployment that can be backed up:

  1. SSH into your jumpbox:
    $ ssh JUMPBOX_USER/JUMPBOX_ADDRESS -i YOUR_CERTIFICATE.pem
    
  2. Run the BBR pre-backup check:

    $ BOSH_CLIENT_SECRET=BOSH_CLIENT_SECRET \
      bbr deployment \
      --target BOSH_DIRECTOR_IP \
      --username BOSH_CLIENT \
      --deployment DEPLOYMENT_NAME \
      --ca-cert PATH_TO_BOSH_SERVER_CERT \
      pre-backup-check
    

    Replace the placeholder values as follows:

    • BOSH_CLIENT, BOSH_CLIENT_SECRET: From the Ops Manager Installation Dashboard, click Ops Manager Director, navigate to the Credentials tab, and click Uaa Bbr Client Credentials to retrieve the BOSH UAA credentials.

      You can also retrieve the credentials using the Ops Manager API with a GET request to the following endpoint: /api/v0/deployed/director/credentials/uaa_bbr_client_credentials. For more information, see the Using the Ops Manager API topic.

    • BOSH_DIRECTOR_IP: You retrieved this value in the Step 4: Retrieve BOSH Director Address and Credentials section.
    • DEPLOYMENT-NAME: You retrieved this value in the Step 5: Identify Your Deployment section.
    • PATH_TO_BOSH_SERVER_CERT: This is the path to the BOSH Director’s Certificate Authority (CA) certificate, if the certificate is not verifiable by the local machine’s certificate chain. If you are using the Ops Manager VM as your jumpbox, locate the certificate at /var/tempest/workspaces/default/root_ca_certificate.
  3. If the pre-backup check succeeds, continue to the next section. If it fails, the deployment you selected may not have the correct backup scripts, or the connection to the BOSH Director may have failed.

    The following error occurs if you have not enabled the Elastic Runtime MySQL Backup Prepare Node:

    1 error occurred:
    
    * The mysql restore script expects a backup script which produces mysql-artifact artifact which is not present in the deployment.
    

    Follow the instructions in the Step 2: Enable Backup Prepare Node section and retry.

Step 9: Back Up Your Elastic Runtime Deployment

Run the BBR backup command from your jumpbox to back up your Elastic Runtime deployment:

$ BOSH_CLIENT_SECRET=BOSH_CLIENT_SECRET \
  nohup bbr deployment \
  --target BOSH_DIRECTOR_IP \
  --username BOSH_CLIENT \
  --deployment DEPLOYMENT_NAME \
  --ca-cert PATH_TO_BOSH_SERVER_CERT \
  backup

Use the optional --debug flag to enable debug logs. See the Logging section for more information.

Note: The BBR backup command can take a long time to complete. Pivotal recommends you run it independently of the SSH session, so that the process can continue running even if your connection to the jumpbox fails. The command above uses nohup but you could also run the command in a screen or tmux session.

If the commands completes successfully, do the following:

  1. Move the backup artifact off the jumpbox to your preferred storage space. The backup created by BBR consists of a folder with the backup artifacts and metadata files. However, Pivotal recommends compressing and encrypting the files.
  2. Make redundant copies of your backup and store them in multiple locations in order to minimize the risk of losing your backups in the event of a disaster.
  3. Attempt a test restore on every backup in order to validate it by performing the procedures in the Step 11: Validate Your Backup section below.

If the command fails, do the following:

  1. Ensure all the parameters in the command are set.
  2. Ensure the BOSH Director credentials are valid.
  3. Ensure the specified deployment exists.
  4. Consult the Exit Codes section below.

Step 10: Back Up Your BOSH Director

Perform the following steps to back up your BOSH Director:

  1. Navigate to the Ops Manager Installation Dashboard.
  2. Click the Ops Manager tile.
  3. Click the Credentials tab.
  4. Locate Bbr Ssh Credentials and click Link to Credential next to it.

    You can also retrieve the credentials using the Ops Manager API with a GET request to the following endpoint: /api/v0/deployed/director/credentials/bbr_ssh_credentials. For more information, see the Using the Ops Manager API topic.

  5. Copy the value for private_key_pem, beginning with "-----BEGIN RSA PRIVATE KEY-----".

  6. SSH into your jumpbox.

  7. Run the following command to reformat the key and save it to a file called PRIVATE_KEY in the current directory, pasting in the contents of your private key for YOUR_PRIVATE_KEY:

    $ printf -- "YOUR_PRIVATE_KEY" > PRIVATE_KEY
    

  8. Run the BBR backup command from your jumpbox to back up your BOSH Director:

    $ bbr director \
      --private-key-path PRIVATE_KEY \
      --username bbr \
      --host HOST \
      backup
    

    Use the optional --debug flag to enable debug logs. See the Logging section for more information.

    Replace the placeholder values as follows:

    • PRIVATE_KEY: This is the path to the private key file you created above.
    • HOST: This is the address of the BOSH Director. If the BOSH Director is public, this is a URL, such as https://my-bosh.xxx.cf-app.com. Otherwise, this is the BOSH_DIRECTOR_IP, which you retrieved in the Step 4: Retrieve BOSH Director Address and Credentials section.

Step 11: (Optional) Validate Your Backup

Warning: When validating your backup, the VMs and disks from the backed up BOSH Director should not visible to the new BOSH Director. As a result, Pivotal recommends that you deploy the new BOSH Director to a different IaaS network and account than the VMs and disks of the backed up BOSH Director.

After backing up PCF, you may want to validate your backup by restoring it to a similar environment and checking the applications. Because BBR is designed for disaster recovery, its backups are intended to be restored to an environment deployed with the same configuration.

Perform the following steps to spin up a second environment that matches the original in order to test a restore:

  1. Export your Ops Manager installation by performing the steps in the Step 3: Export Installation Settings section.
  2. Create a new Ops Manager VM in a different network to the original. Ensure that the Ops Manager VM has enough persistent disk to accommodate the files exported in the previous step. Consult the topic specific to your IaaS:
  3. In a browser, navigate to the FQDN of your new Ops Manager. When redirected to the Welcome to Ops Manager page, select Import Existing Installation.

    Welcome

  4. In the import panel, perform the following tasks:

    • Enter your Decryption Passphrase.
    • Click Choose File and browse to the installation zip file that you exported in the Export Installation Settings section of this topic.

    Decryption passphrase

  5. Click Import.

    Note: Some browsers do not provide feedback on the status of the import process, and may appear to hang.

  6. A “Successfully imported installation” message appears upon completion.

    Success
    Importing your installation ensures that your new PCF deployment has the same credentials and configuration as your original deployment.

  7. Click the Ops Manager Director tile.

  8. Click Create Networks and update the networks as appropriate.

  9. SSH into your Ops Manager VM. For more information, see the SSH into Ops Manager section of the Advanced Troubleshooting with the BOSH CLI topic.

  10. On the Ops Manager VM, delete the /var/tempest/workspaces/default/deployments/bosh-state.json file:

    $ rm /var/tempest/workspaces/default/deployments/bosh-state.json

  11. Navigate to the Ops Manager Installation Dashboard and click Apply Changes to deploy a new BOSH Director and a new Elastic Runtime.

  12. Run the BBR restore command from your jumpbox to restore your Elastic Runtime deployment:

    $ BOSH_CLIENT_SECRET=BOSH_CLIENT_SECRET \
      bbr deployment \
      --target BOSH_DIRECTOR_IP \
      --username BOSH_CLIENT \
      --deployment DEPLOYMENT_NAME \
      --ca-cert PATH_TO_BOSH_SERVER_CERT \
      restore

  13. Check the status of your applications by performing the procedures in the next section.

Check Status of Applications

After the restore is completed, perform the following steps to check the status of your applications:

  1. Target the Cloud Controller of your new deployment:
    $ cf api api.YOUR-SYSTEM-DOMAIN
  2. Log in:
    $ cf login
  3. Verify that your restored PCF has the same orgs, spaces, apps, routes and services as your original deployment:

    • To see the list of orgs, run cf orgs.
    • Target each org in turn with cf target -o YOUR-ORG.
    • To see the list of spaces for your targeted org, run cf spaces.
    • Target each space in turn with cf target -s YOUR-SPACE.
    • To see the list of routes and their domains for your targeted space, run cf routes.
    • To see the list of apps for your targeted space, run cf apps. Check that the apps which should be running can start successfully.
    • Ensure that your apps are still bound to the expected services with cf services. Backing up PCF with BBR doesn’t back up service data.

      Under normal circumstances, the existing domain will not be pointed to your restored PCF deployment. Therefore, if you want to make HTTP requests to the restored applications, you must use the IP address of the restored Router. However, the restored PCF deployment remains linked to the original domain. As a result, you must set the original domain in the Host header in order to route HTTP requests to the restored applications.

      You can use curl to set the original domain in the header. In the following example command, the Router IP address is 10.0.1.16 and the app domain is cf-original-app.com:

      $ curl -k -H"Host: cf-original-app.com" https://10.0.1.16
      

      The restored applications will have the same service bindings as the original. If your applications connect to an external data store, your restored applications will also connect, and perform whatever interactions your original applications would do.

Exit Codes and Logging

For information about the exit codes returned by BBR and BBR logging, consult the sections below.

Exit Codes

The exit code returned by BBR indicates the status of the backup. The following table matches exit codes to error messages.

Value Error
0 Success
1 General failure
4 The pre-backup lock failed.
8 The post-backup unlock failed. Your deployment may be in a bad state and requires attention.
16 The cleanup failed. This is a non-fatal error indicating that the utility has been unable to clean up open BOSH SSH connections to the deployment VMs. Manual cleanup may be required to clear any hanging BOSH users and connections.

If multiple failures occur, your exit code reflects a combination of values. Use bitwise AND to determine which failures occurred.

For example, the exit code 5 indicates that the pre-backup lock failed and a general error occurred.

To check that a bit is set, use bitwise AND, as demonstrated by the following example of exit code 20:

20 | 1  == 1    # false
20 | 4  == 4    # true; lock failed
20 | 8  == 8    # false
20 | 16 == 16   # true; cleanup failed

Exit code 20 indicates that the pre-backup lock failed and cleanup failed.

Logging

BBR outputs logs to stdout. By default, BBR logs:

  • The backup and restore scripts that it finds
  • When it starts or finishes a stage, such as pre-backup scripts or backup scripts
  • When the process is complete
  • When any error occurs

If more logging is needed, use the optional --debug flag to print the following information:

  • Logs about the API requests made to the BOSH server
  • All commands executed on remote instances
  • All commands executed on local environment
  • Standard in and standard out streams for the backup and restore scripts when they are executed

Canceling a Backup

If you need to cancel a backup, perform the following steps:

  1. Terminate the BBR process by pressing Ctrl-C and then typing yes to confirm.
  2. Log in to your BOSH Director with the BOSH CLI by performing the procedures in the Step 4: Retrieve BOSH Director Address and Credentials section above.
  3. Perform the following steps for each cloud_controller VM in your deployment:
    1. List the VMs in your deployment:
      $ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate \
      -d DEPLOYMENT_NAME \
      ssh
    2. Select the VM you want to SSH into.
    3. Run the following command from the VM:
      $ sudo /var/vcap/jobs/cloud-controller-backup/bin/bbr/post-backup-unlock
  4. Run the BBR pre-backup check from your jumpbox by following the steps in the Step 8: Check Your Deployment section above. If the command reports that it cannot back up the deployment, SSH onto each VM mentioned in the error using the BOSH CLI and remove the /var/vcap/store/bbr-backup directory if present.

Troubleshooting

This section lists common troubleshooting scenarios and their solutions.

Symptom

The Elastic Runtime backup fails with the following error:

The mysql restore script expects a backup script 
which produces mysql-artifact artifact which 
is not present in the deployment.

Explanation

BBR requires the MySQL backup prepare node to be enabled.

Solution

Follow the procedures in Step 2: Enable Backup Prepare Node and re-run the BBR backup command.

Create a pull request or raise an issue on the source for this page in GitHub