Backing Up Pivotal Cloud Foundry with BBR

Page last updated:

This topic describes the procedure for backing up your critical backend Pivotal Cloud Foundry (PCF) components with BOSH Backup and Restore (BBR), a command-line tool for backing up and restoring BOSH deployments. To restore your backup, see the Restoring Pivotal Cloud Foundry from Backup with BBR topic.

To view the BBR release notes, see the BOSH Backup and Restore Release Notes.

During the backup, BBR stops the Cloud Controller API and the Cloud Controller workers to create a consistent backup. Only the API functionality, like pushing applications or using the Cloud Foundry Command Line Interface (cf CLI) are affected. The deployed applications do not experience downtime.

Warning: Backup artifacts can contain secrets. Pivotal strongly recommends that you secure backup artifacts using encryption or other means.

Note: You can only use BBR to back up PCF v1.11 and later. To back up earlier versions of PCF, perform the manual procedure documented for your specific PCF version.

Warning: BBR is designed to restore Pivotal Cloud Foundry after a disaster. The restore is a destructive operation. If the it fails, the environment may be left in an unusable state and require reprovisioning. The two restore scenarios currently documented are Restoring Pivotal Cloud Foundry from Backup with BBR and Rolling Back ERT Deployment to an Earlier Backup with BBR.

Recommendations

Pivotal recommends the following:

  • Follow the full procedure documented in this topic when creating a backup. This ensures that you always have a consistent backup of Ops Manager and ERT to restore from.
  • Back up frequently, especially before making any changes to your PCF deployment, such as the configuration of any tiles in Ops Manager.

Compatibility of Restore

When using a backup to restore, you must ensure that the restore environment is compatible. For more information, see Compatibility of Restore.

Supported Components

BBR is a binary that can back up and restore BOSH deployments and BOSH Directors. BBR requires that the backup targets supply scripts that implement the backup and restore functions.

BBR backs up the following PCF components:

  • Elastic Runtime: Elastic Runtime must be configured with an internal MySQL database and a WebDAV/NFS blobstore to be backed up and restored with BBR. BBR does not support Elastic Runtime with an external blobstore or an external MySQL database.

Warning: If you use BBR to back up an ERT with an external blobstore or an external MySQL database, the BBR commands may run successfully, but the result of the BBR restore will be incomplete and unusable.

  • BOSH Director: The BOSH Director must have an internal Postgres database to be backed up and restored with BBR. As part of backing up the BOSH Director, BBR backs up the BOSH UAA database and the CredHub database.

Backing Up Services

Warning: BBR does not currently back up any service data.

Keep in mind the following when backing up services:

  • You can back up and restore brokered services with the procedures documented in this topic and in the Restoring Pivotal Cloud Foundry from Backup with BBR topic. BBR backs up and restores the VMs and the service instances, but not the service data.
  • You can redeploy on-demand service instances manually during restore, but the data in the instance is not backed up.
  • BBR does not back up managed services or their data.

Workflow

Operators download the BBR binary and transfer it to a jumpbox. Then they run BBR from the jumpbox, specifying the name of the BOSH deployment to back up.

BBR examines the jobs in the BOSH deployment, and triggers the scripts in the following stages:

  1. Pre-backup lock: The pre-backup lock scripts locks the job so backups are consistent across the cluster.
  2. Backup: The backup script backs up the release.
  3. Post-backup unlock: The post-backup unlock script unlocks the job after the backup is complete.

Scripts in the same stage are all triggered together. For instance, BBR triggers all pre-backup lock scripts before any backup scripts. Scripts within a stage may be triggered in any order.

The backup artifacts are drained to the jumpbox, where the operator can transfer them to storage and use them to restore PCF.

The following diagram shows a sample backup flow.

Backup flow

Before using BBR, follow the instructions in the Setting up your system for BBR section.

Prepare to Create Your Backup

Step 1: Set Up Your Jumpbox

Prepare your jumpbox for BBR by following the steps in the Setting Up Your Jumpbox for BBR topic.

Step 2: Record the Cloud Controller Database Encryption Credentials

Perform the following steps to retrieve the Cloud Controller Database encryption credentials from the Elastic Runtime tile:

  1. Navigate to Ops Manager in a browser and log in to the Ops Manager Installation Dashboard.
  2. Select Pivotal Elastic Runtime > Credentials and locate the Cloud Controller section.
  3. Record the Cloud Controller DB Encryption Credentials. You must provide these credentials if you contact Pivotal Support for help restoring your installation.

    Ccdb encrypt creds

Step 3: Retrieve BOSH Director Address and Credentials

Perform the following steps to retrieve the IP address of your BOSH Director and the credentials for logging in from the Ops Manager Director tile:

  1. Install the BOSH v2 CLI on a machine outside of your PCF deployment. You can use the jumpbox for this task.
  2. From the Installation Dashboard in Ops Manager, select Ops Manager Director > Status and record the IP address listed for the Director. You access the BOSH Director using this IP address.

  3. Click Credentials and record the Director credentials.
  4. From the command line, log into the BOSH Director using the IP address and credentials that you recorded:
    $ bosh -e DIRECTOR_IP \
    --ca-cert PATH-TO-BOSH-SERVER-CERT log-in
    Email (): director
    Password (): *******************
    Successfully authenticated with UAA
    Succeeded
    

Step 4: Check your BOSH Director

Perform the following steps to back up your BOSH Director:

  1. Navigate to the Ops Manager Installation Dashboard.
  2. Click the Ops Manager tile.
  3. Click the Credentials tab.
  4. Locate Bbr Ssh Credentials and click Link to Credential next to it.

    You can also retrieve the credentials using the Ops Manager API with a GET request to the following endpoint: /api/v0/deployed/director/credentials/bbr_ssh_credentials. For more information, see the Using the Ops Manager API topic.

  5. Copy the value for private_key_pem, beginning with "-----BEGIN RSA PRIVATE KEY-----".

  6. SSH into your jumpbox:

    $ ssh JUMPBOX_USER/JUMPBOX_ADDRESS -i YOUR_CERTIFICATE.pem
    

  7. Run the following command to reformat the key and save it to a file called PRIVATE_KEY in the current directory, pasting in the contents of your private key for YOUR_PRIVATE_KEY:

    $ printf -- "YOUR_PRIVATE_KEY" > PRIVATE_KEY
    

  8. Run the BBR pre-backup-check command from your jumpbox:

    $ bbr director \
      --private-key-path PRIVATE_KEY \
      --username bbr \
      --host HOST \
      pre-backup-check
    

    Use the optional --debug flag to enable debug logs. See the Logging section for more information.

    Replace the placeholder values as follows:

    • PRIVATE_KEY: This is the path to the private key file you created above.
    • HOST: This is the address of the BOSH Director. If the BOSH Director is public, this is a URL, such as https://my-bosh.xxx.cf-app.com. Otherwise, this is the BOSH_DIRECTOR_IP, which you retrieved in the Step 3: Retrieve BOSH Director Address and Credentials section.
  9. If the pre-backup check succeeds, continue to the next section. If it fails, the Director may not have the correct backup scripts, or the connection to the BOSH Director may have failed.

Step 5: Enable Backup Prepare Node

When BBR backs up ERT, it needs a MySQL backup prepare node. This procedure ensures that a MySQL backup prepare node exists even when automated backups are disabled:

  1. In Ops Manager, open the Pivotal Elastic Runtime tile.

  2. If automated backups are disabled under Internal MySQL > Automated Backups Configuration, do the following:

    1. In the Resource Config pane find the Backup Prepare Node job and select 1 from the Instances dropdown menu. Backup prepare node
    2. Click Save.
    3. Navigate back to the Installation Dashboard and click Apply Changes to redeploy.

No action is needed with automated backups enabled, because Elastic Runtime allocates a backup prepare node automatically.

Step 6: Identify Your Deployment

After logging in to your BOSH Director, run the following command to identify the name of the BOSH deployment that contains PCF:

$ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate deployments

Name                     Release(s)
cf-example               push-apps-manager-release/661.1.24
                         cf-backup-and-restore/0.0.1
                         binary-buildpack/1.0.11
                         capi/1.28.0
                         cf-autoscaling/91
                         cf-mysql/35
                         ...

In the above example, the name of the BOSH deployment that contains PCF is cf-example.

Step 7: Check Your Deployment

Perform the following steps to check that your BOSH Director is reachable and has a deployment that can be backed up:

  1. From your jumpbox, run the BBR pre-backup check:

    $ BOSH_CLIENT_SECRET=BOSH_PASSWORD \
      bbr deployment \
      --target BOSH_DIRECTOR_IP \
      --username BOSH_CLIENT \
      --deployment DEPLOYMENT_NAME \
      --ca-cert PATH_TO_BOSH_SERVER_CERT \
      pre-backup-check
    

    Replace the placeholder values as follows:

    • BOSH_CLIENT, BOSH_PASSWORD: From the Ops Manager Installation Dashboard, click Ops Manager Director, navigate to the Credentials tab, and click Uaa Bbr Client Credentials to retrieve the BOSH UAA credentials.

      You can also retrieve the credentials using the Ops Manager API with a GET request to the following endpoint: /api/v0/deployed/director/credentials/uaa_bbr_client_credentials. For more information, see the Using the Ops Manager API topic.

    • BOSH_DIRECTOR_IP: You retrieved this value in the Step 3: Retrieve BOSH Director Address and Credentials section.
    • DEPLOYMENT-NAME: You retrieved this value in the Step 6: Identify Your Deployment section.
    • PATH_TO_BOSH_SERVER_CERT: This is the path to the BOSH Director’s Certificate Authority (CA) certificate, if the certificate is not verifiable by the local machine’s certificate chain. If you are using the Ops Manager VM as your jumpbox, locate the certificate at /var/tempest/workspaces/default/root_ca_certificate.
  2. If the pre-backup check succeeds, continue to the next section. If it fails, the deployment you selected may not have the correct backup scripts, or the connection to the BOSH Director may have failed.

    The following error occurs if you have not enabled the Elastic Runtime MySQL Backup Prepare Node:

    1 error occurred:
    
    * The mysql restore script expects a backup script which produces mysql-artifact artifact which is not present in the deployment.
    

    Follow the instructions in the Step 5: Enable Backup Prepare Node section and retry.

Create Your Backup

Step 8: Export Installation Settings

Pivotal recommends that you back up your installation settings by exporting frequently. This option is only available after you have deployed at least one time. Always export an installation before following the steps in the Import Installation Settings section of the Restoring Pivotal Cloud Foundry from Backup with BBR topic.

Note: Exporting your installation only backs up your installation settings. It does not back up your virtual machines (VMs) or any external MySQL databases.

From the Installation Dashboard in the Ops Manager interface, click your user name at the top right navigation. Select Settings.

Export Installation Settings exports the current PCF installation settings and assets.

Note: Ops Manager 1.12 exports installation settings only, so the output is much smaller than in previous Ops Manager versions.

Settings

Step 9: Back Up Your BOSH Director

Run the BBR backup command from your jumpbox to back up your BOSH Director:

$ bbr director \
  --private-key-path PRIVATE_KEY \
  --username bbr \
  --host HOST \
  backup

Use the optional --debug flag to enable debug logs. See the Logging section for more information.

Replace the placeholder values as follows:

Note: The BOSH Director backup takes at least 20 minutes.

Step 10: Back Up Your Elastic Runtime Deployment

  1. If you are using an external blobstore, create a copy of the blobstore with your IaaS specific tool. Your blobstore backup may be slightly inconsistent with your Elastic Runtime backup depending on the duration of time between performing the backups.

  2. Run the BBR backup command from your jumpbox to back up your Elastic Runtime deployment:

    $ BOSH_CLIENT_SECRET=BOSH_PASSWORD \
    nohup bbr deployment \
    --target BOSH_DIRECTOR_IP \
    --username BOSH_CLIENT \
    --deployment DEPLOYMENT_NAME \
    --ca-cert PATH_TO_BOSH_SERVER_CERT \
    backup

    • Use the optional --debug flag to enable debug logs. See the Logging section for more information.
    • Use the optional --with-manifest flag to ensure that BBR downloads your current deployment manifest when backing up. These manifests are included in the Ops Manager export, but are useful for reference. For example:
      $ BOSH_CLIENT_SECRET=BOSH_PASSWORD \
      nohup bbr deployment \
      --target BOSH_DIRECTOR_IP \
      --username BOSH_CLIENT \
      --deployment DEPLOYMENT_NAME \
      --ca-cert PATH_TO_BOSH_SERVER_CERT \
      backup --with-manifest

    Note: Backing up Elastic Runtime takes at least 10 minutes, and can take considerably longer with larger blobstores or slow network connections. The backup also incurs around 10 minutes of Cloud Controller downtime, during which users will be unable to push, scale, or delete apps. Your apps will not be affected.

    Note: Because the BBR backup command can take a long time to complete, Pivotal recommends you run it independently of the SSH session, so that the process can continue running even if your connection to the jumpbox fails. The command above uses nohup but you could also run the command in a screen or tmux session.

  3. If the commands completes successfully, do the following:

    1. Move the backup artifact off the jumpbox to your preferred storage space. The backup created by BBR consists of a folder with the backup artifacts and metadata files. However, Pivotal recommends compressing and encrypting the files.
    2. Make redundant copies of your backup and store them in multiple locations in order to minimize the risk of losing your backups in the event of a disaster.
    3. Attempt a test restore on every backup in order to validate it by performing the procedures in the Step 11: Validate Your Backup section below.
  4. If the command fails, do the following:

    1. Ensure all the parameters in the command are set.
    2. Ensure the BOSH Director credentials are valid.
    3. Ensure the specified deployment exists.
    4. Consult the Exit Codes section below.

Step 11: (Optional) Validate Your Backup

If you want to validate your backup, follow the instructions that correspond to your use case:

Validate Your Entire Backup

Warning: When validating your backup, the VMs and disks from the backed-up BOSH Director should not be visible to the new BOSH Director. As a result, Pivotal recommends that you deploy the new BOSH Director to a different IaaS network and account than the VMs and disks of the backed-up BOSH Director.

Warning: If the apps in your backed-up environment refer to any data services outside of PCF, they will try to connect to those data services when you restore to a new environment. This may produce side effects with the data services. For example, consider an app that processes mail queues and connects to an external database. When you validate your backup in a test environment, the app may start processing the queue, and this work may be lost.

After backing up PCF, you may want to validate your backup by restoring it to a similar environment and checking the applications. Because BBR is designed for disaster recovery, its backups are intended to be restored to an environment deployed with the same configuration.

Perform the following steps to spin up a second environment that matches the original in order to test a restore:

  1. Export your Ops Manager installation by performing the steps in the Step 8: Export Installation Settings section.
  2. Create a new Ops Manager VM in a different network to the original. Ensure that the Ops Manager VM has enough persistent disk to accommodate the files exported in the previous step. Consult the topic specific to your IaaS:

After you deploy the second environment, follow the instructions in the Restoring Pivotal Cloud Foundry from Backup with BBR topic.

Validate Your ERT Backup Only

For a sandbox or other non-production environment, you can optionally perform an in-place restore of ERT only. In this case, you restore the ERT backup to the same PCF environment that the backup was created from. Follow the procedures in Restoring an ERT Backup In-place.

Exit Codes and Logging

For information about the exit codes returned by BBR and BBR logging, consult the sections below.

Exit Codes

The exit code returned by BBR indicates the status of the backup. The following table matches exit codes to error messages.

Value Error
0 Success
1 General failure
4 The pre-backup lock failed.
8 The post-backup unlock failed. Your deployment may be in a bad state and requires attention.
16 The cleanup failed. This is a non-fatal error indicating that the utility has been unable to clean up open BOSH SSH connections to the deployment VMs. Manual cleanup may be required to clear any hanging BOSH users and connections.

If multiple failures occur, your exit code reflects a combination of values. Use bitwise AND to determine which failures occurred.

For example, the exit code 5 indicates that the pre-backup lock failed and a general error occurred.

To check that a bit is set, use bitwise AND, as demonstrated by the following example of exit code 20:

20 | 1  == 1    # false
20 | 4  == 4    # true; lock failed
20 | 8  == 8    # false
20 | 16 == 16   # true; cleanup failed

Exit code 20 indicates that the pre-backup lock failed and cleanup failed.

Logging

BBR outputs logs to stdout. By default, BBR logs:

  • The backup and restore scripts that it finds
  • When it starts or finishes a stage, such as pre-backup scripts or backup scripts
  • When the process is complete
  • When any error occurs

If more logging is needed, use the optional --debug flag to print the following information:

  • Logs about the API requests made to the BOSH server
  • All commands executed on remote instances
  • All commands executed on local environment
  • Standard in and standard out streams for the backup and restore scripts when they are executed

Canceling a Backup

If you need to cancel a backup, perform the following steps:

  1. Terminate the BBR process by pressing Ctrl-C and then typing yes to confirm.
  2. Log in to your BOSH Director with the BOSH CLI by performing the procedures in the Step 3: Retrieve BOSH Director Address and Credentials section above.
  3. Perform the following steps for each cloud_controller VM in your deployment:
    1. List the VMs in your deployment:
      $ bosh -e DIRECTOR_IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate \
      -d DEPLOYMENT_NAME \
      ssh
    2. Select the VM you want to SSH into.
    3. Run the following command from the VM:
      $ sudo /var/vcap/jobs/cloud-controller-backup/bin/bbr/post-backup-unlock
  4. Run the BBR pre-backup check from your jumpbox by following the steps in the Step 7: Check Your Deployment section above. If the command reports that it cannot back up the deployment, SSH onto each VM mentioned in the error using the BOSH CLI and remove the /var/vcap/store/bbr-backup directory if present.
Create a pull request or raise an issue on the source for this page in GitHub