Restoring PCF from Backup with BBR

Page last updated:

This topic describes the procedure for restoring your critical backend Pivotal Cloud Foundry (PCF) components with BOSH Backup and Restore (BBR), a command-line tool for backing up and restoring BOSH deployments. To perform the procedures in this topic, you must have backed up PCF by following the steps in the Backing Up Pivotal Cloud Foundry with BBR topic.

To view the BBR release notes, see BOSH Backup and Restore Release Notes.

The procedures described in this topic prepare your environment for PCF, deploy Ops Manager, import your installation settings, and use BBR to restore your PCF components.

WARNING: Restoring PCF with BBR is a destructive operation. If the restore fails, the new environment may be left in an unusable state and require reprovisioning. Only perform the procedures in this topic for the purpose of disaster recovery, such as recreating PCF after a storage-area network (SAN) corruption.

WARNING: When validating your backup, the VMs and disks from the backed-up BOSH Director should not visible to the new BOSH Director. As a result, Pivotal recommends that you deploy the new BOSH Director to a different IaaS network and account than the VMs and disks of the backed up BOSH Director.

WARNING: For PCF v2.0, BBR only supports backup and restore of environments with zero or one CredHub instances.

Note: If the BOSH Director you are restoring had any deployments that were deployed manually rather than through an Ops Manager Tile, you must restore them manually at the end of the process. For more information, see (Optional) Step 14: Restore Non-Tile Deployments.

Compatibility of Restore

This section describes the restrictions for a backup artifact to be restorable to another environment. This section is for guidance only, and Pivotal highly recommends that operators validate their backups by using the backup artifacts in a restore.

Consult the following restrictions for a backup artifact to be restorable:

  • CIDR ranges: BBR requires the IP address ranges to be the same in the restore environment as in the backup environment.
  • Topology: BBR requires the BOSH topology of a deployment to be the same in the restore environment as it was in the backup environment.
  • Naming of instance groups and jobs: For any deployment that implements the backup and restore scripts, the instance groups and jobs must have the same names.
  • Number of instance groups and jobs: For instance groups and jobs that have backup and restore scripts, there must be the same number of instances.
  • Limited validation: BBR puts the backed up data into the corresponding instance groups and jobs in the restored environment, but can’t validate the restore beyond that. For example, if the MySQL encryption key is different in the restore environment, the BBR restore might succeed although the restored MySQL database is unusable.
  • PCF version: BBR can restore to the same version of PCF that was backed up. BBR does not support restoring to other major, minor, or patch releases.

Note: A change in VM size or underlying hardware should not affect BBR’s ability to restore data, as long as there is adequate storage space to restore the data.

Restore Workflow

Click the diagram below to see the full-size image.

Flow chart shows the PCF Operator through a series of steps interracting with the BOSH CLI, Ops Manager VM, Director VM, and the PAS tile. Details on these steps are described below.

The diagram above shows the flow of the PCF restore process in a series of steps performed by the PCF operator. The following steps will be covered in more detail throughout this topic.

  1. Launch new Ops Manager: Perform the procedures for your IaaS to deploy Ops Manager. See part one of the Deploy Ops Manager and Import Installation Settings step below for more information.

  2. Import settings: You can import settings either with the Ops Manager UI or API. See part two of the Deploy Ops Manager and Import Installation Settings step below for more information.

  3. Remove bosh-state.json: SSH into your Ops Manager VM and delete the bosh-state.json file. See the Remove BOSH State File step below for more information.

  4. Apply Changes (Director only): Use the Ops Manager API to only deploy the BOSH Director. See the Deploy the BOSH Director step below for more information.

  5. bbr restore <director>: Run the BBR restore command from your jumpbox to restore the BOSH Director. See the Restore the BOSH Director step below for more information.

  6. Use BOSH cck to fix the stale cloud ID references in the BOSH database: For each deployment in the BOSH Director, you will need to run a bosh cloud-check command. See the Remove Stale Cloud IDs for All Deployments step for more information.

  7. Apply Changes: Click Apply Changes from the Ops Manager Installation Dashboard.

  8. bbr restore <PAS>: Run the BBR restore command from your jumpbox to restore PAS. See the Restore PAS step below for more information.

Prepare to Restore

This section provides the steps you need to perform before restoring your PCF backup with BBR.

Step 1: (Optional) Prepare Your Environment

In an event of a disaster, you may lose not only your VMs and disks, but your IaaS resources as well, such as networks and load balancers.

If you need to recreate your IaaS resources, prepare your environment for PCF by following the instructions specific to your IaaS in Installing Pivotal Cloud Foundry.

Note: The instructions for installing PCF on Amazon Web Services (AWS) and OpenStack combine the procedures for preparing your environment and deploying Ops Manager into a single topic. The instructions for the other supported IaaSes split these procedures into two separate topics.

If you recreate your IaaS resources, you must also add those resources to Ops Manager by performing the procedures in the Step 3: (Optional) Configure Ops Manager for New Resources section.

Step 2: Deploy Ops Manager and Import Installation Settings

  1. Perform the procedures for your IaaS to deploy Ops Manager:

    1. AWS
      1. If you configured AWS manually, see Step 12 through Step 19 of the Configuring AWS for PCF topic.
      2. If you used the CloudFormation template install PCF on AWS, see Deploying BOSH and Ops Manager to AWS
    2. Azure, see Deploying BOSH and Ops Manager to Azure with ARM
    3. GCP, see Deploying BOSH and Ops Manager to GCP
    4. OpenStack, see Step 3 through Step 7 of the Provisioning the OpenStack Infrastructure topic.
    5. vSphere, see Deploying Operations Manager to vSphere
  2. Import your installation settings. This can be done in two ways:

    1. Using the Ops Manager UI:

      1. Access your new Ops Manager by navigating to YOUR-OPS-MAN-FQDN in a browser.
      2. On the Welcome to Ops Manager page, click Import Existing Installation.

        Welcome

      3. In the import panel, perform the following tasks:

        • Enter the Decryption Passphrase in use when you exported the installation settings from Ops Manager.
        • Click Choose File and browse to the installation zip file that you exported in the Step 7: Export Installation Settings section of the Backing Up Pivotal Cloud Foundry with BBR topic.

        Decryption passphrase

      4. Click Import.

        Note: Some browsers do not provide feedback on the status of the import process, and may appear to hang. The import process takes at least 10 minutes, and takes longer the more tiles that were present on the backed-up Ops Manager.

      5. A Successfully imported installation message appears upon completion.

        Success

    2. Using the Ops Manager API:

      curl OPSMAN-URL/api/v1/installation_asset_collection \
        -X POST \
        -H "Authorization: Bearer UAA-ACCESS-TOKEN" \
        -F 'installation[file]=@installation.zip' \
        -F 'passphrase=DECRYPTION-PASSPHRASE'
      

      Where:

      • UAA-ACCESS-TOKEN is the UAA access token. For more information about how to retrieve this token, see Using the Ops Manager API.
      • DECRYPTION-PASSPHRASE is the decryption passphrase in use when you exported the installation settings from Ops Manager.

WARNING: Do not click Apply Changes in Ops Manager until the instruction in Step 11: Redeploy PAS.

Step 3: (Optional) Configure Ops Manager for New Resources

If you recreated IaaS resources such as networks and load balancers by following the steps in the Step 1: (Optional) Prepare Your Environment section above, perform the following steps to update Ops Manager with your new resources:

  1. Enable Ops Manager advanced mode. For more information, see How to Enable Advanced Mode in the Ops Manager in the Pivotal Knowledge Base.

    Note: In advanced mode Ops Manager will allow you to make changes that are normally disabled. You may see warning messages when you save changes.

  2. Navigate to the Ops Manager Installation Dashboard and click the BOSH Director tile.

  3. If you are using Google Cloud Platform (GCP), click Google Config and update:

    1. Project ID to reflect the GCP project ID.
    2. Default Deployment Tag to reflect the environment name.
    3. AuthJSON to reflect the service account.
  4. Click Create Networks and update the network names to reflect the network names for the new environment.

  5. If your BOSH Director had an external hostname, you should change it in Director Config > Director Hostname to ensure it does not conflict with the hostname of the backed up Director.

  6. Return to the Ops Manager Installation Dashboard and click the Pivotal Application Service (PAS) tile.

  7. Click Resource Config. If necessary for your IaaS, enter the name of your new load balancers in the Load Balancers column.

  8. If necessary, click Networking and update the load balancer SSL certificate and private key under Certificates and Private Keys for HAProxy and Router.

  9. If your environment has a new DNS address, update the old environment DNS entries to point to the new load balancer addresses. For more information, see the Step 4: Configure Networking section of the Using Your Own Load Balancer topic and follow the link to the instructions for your IaaS.

  10. If your PAS uses an external blobstore, ensure that the PAS tile is configured to use a different blobstore, otherwise it will attempt to connect to the blobstore that the backed up PAS is using.

  11. Ensure your System Domain and Apps Domain under PAS Domains are updated to refer to the new environment’s domains.

  12. Ensure that there are no outstanding warning messages in the BOSH Director tile, then disable Ops Manager advanced mode. For more information, see How to Enable Advanced Mode in the Ops Manager in the Pivotal Knowledge Base.

Step 4: Remove BOSH State File

  1. SSH into your Ops Manager VM. For more information, see the SSH into Ops Manager section of the Advanced Troubleshooting with the BOSH CLI topic.

  2. On the Ops Manager VM, delete the /var/tempest/workspaces/default/deployments/bosh-state.json file:

    $ sudo rm /var/tempest/workspaces/default/deployments/bosh-state.json
    

  3. Navigate to YOUR-OPS-MAN-FQDN in a browser and log into Ops Manager.

  4. Upload the required stemcell for each tile that requires one. Each tile that requires an upload displays with an orange indicator at the bottom. When clicked, the Stemcell panel shows an orange circle beside it.

Step 5: Deploy the BOSH Director

Perform the steps in the Applying Changes to BOSH Director topic to use the Ops Manager API to only deploy the BOSH Director.

Step 6: Transfer Artifacts to Jumpbox

In the Step 9: Back Up Your PAS Deployment section of the Backing Up Pivotal Cloud Foundry with BBR topic, in the After Taking the Backups section you moved the TAR and metadata files of the backup artifacts off your jumpbox to your preferred storage space. Now you must transfer those files back to your jumpbox.

Note: Pivotal recommends that you regularly update the BBR binary on your jumpbox to the latest version. See Transfer BBR Binary to Your Jumpbox in Setting Up Your Jumpbox for BBR for more information.

Step 7: Retrieve BOSH Director Address and Credentials

Perform the following steps to retrieve the IP address of your BOSH Director and the credentials for logging in from the Ops Manager Director tile:

  1. If you are not using the Ops Manager VM as your jumpbox, install the latest BOSH CLI on your jumpbox.
  2. From the Installation Dashboard in Ops Manager, select BOSH Director > Status and record the IP address listed for the Director. You access the BOSH Director using this IP address.

  3. Click Credentials and record the Director credentials.
  4. From the command line, log into the BOSH Director using the IP address and credentials that you recorded:
    $ bosh -e DIRECTOR_IP \
    --ca-cert PATH-TO-BOSH-SERVER-CERTIFICATE log-in
    Email (): director
    Password (): *******************
    Successfully authenticated with UAA
    Succeeded
    

Restore Your Backup

This section provides the steps you need to perform to restore your PCF backup with BBR.

Step 8: Restore the BOSH Director

Notes:
  • The BBR BOSH Director restore command can take at least 15 minutes to complete.
  • Pivotal recommends that you run it independently of the SSH session, so that the process can continue running even if your connection to the jumpbox fails. The command above uses nohup but you could also run the command in a screen or tmux session.
  1. Navigate to the Ops Manager Installation Dashboard.

  2. Click the Ops Manager tile.

  3. Click the Credentials tab.

  4. Locate Bbr Ssh Credentials and click Link to Credential next to it.

    You can also retrieve the credentials using the Ops Manager API with a GET request to the following endpoint: /api/v0/deployed/director/credentials/bbr_ssh_credentials. For more information, see the Using the Ops Manager API topic.

  5. Copy the value for private_key_pem.

  6. SSH into your jumpbox.

  7. Run the following command to reformat the key and save it to a file named PRIVATE-KEY in the current directory, copying in the contents of your private key for YOUR-PRIVATE-KEY:

    $ printf -- "YOUR-PRIVATE-KEY" > PRIVATE-KEY
    

  8. Ensure the BOSH Director backup artifact is in the folder you from which you will run BBR.

  9. Run the BBR restore command from your jumpbox to restore the BOSH Director:

    $ bbr director \
      --private-key-path PRIVATE-KEY \
      --username bbr \
      --host HOST \
      restore \
        --artifact-path PATH-TO-DIRECTOR-BACKUP
    
    Replace the placeholder values as follows:

    • PATH-TO-DIRECTOR-BACKUP: This is the path to the Director backup you want to restore.
    • PRIVATE-KEY: This is the path to the private key file you created above.
    • HOST: This is the address of the BOSH Director. If the BOSH Director is public, this is a URL, such as https://my-bosh.xxx.cf-app.com. Otherwise, it is the BOSH-DIRECTOR-IP, which you retrieved in the Step 7: Retrieve BOSH Director Address and Credentials section.

      Use the optional --debug flag to enable debug logs. See the Logging section of the Backing Up Pivotal Cloud Foundry with BBR topic for more information.

If the commands completes successfully, continue to Step 9: Identify Your Deployment.

If the command fails, run the BBR restore-cleanup command:

$ bbr director \
  --private-key-path PRIVATE-KEY \
  --username bbr \
  --host HOST \
  restore-cleanup

Then, check the following before running the BBR restore command again:

  • Ensure all the parameters in the command are set.
  • Ensure the BOSH Director credentials are valid.
  • Ensure the specified deployment exists.
  • Ensure the source deployment is compatible with the target deployment.
  • Ensure that the jumpbox can reach the BOSH Director.

Step 9: Identify Your Deployment

After logging in to your BOSH Director, run bosh deployments to identify the name of the BOSH deployment that contains PCF:

$ bosh -e BOSH-DIRECTOR-IP --ca-cert PATH-TO-BOSH-SERVER-CERTIFICATE deployments

Name                     Release(s)
cf-example               push-apps-manager-release/661.1.24
                         cf-backup-and-restore/0.0.1
                         binary-buildpack/1.0.11
                         capi/1.28.0
                         cf-autoscaling/91
                         cf-mysql/35
                         ...

In the above example, the name of the BOSH deployment that contains PCF is cf-example. PATH-TO-BOSH-SERVER-CERTIFICATE is the path to the Certificate Authority (CA) certificate for the BOSH Director. For more information, see Ensure BOSH Director Certificate Availability.

Step 10: Remove Stale Cloud IDs for All Deployments

For every deployment in the BOSH Director, run the following command:

$ bosh -e BOSH-DIRECTOR-IP \
  --ca-cert PATH-TO-BOSH-SERVER-CERTIFICATE \
  -d DEPLOYMENT-NAME -n cck \
  --resolution delete_disk_reference \
  --resolution delete_vm_reference

This reconciles the BOSH Director’s internal state with the state in the IaaS. You can use the list of deployments returned in Step 9: Identify Your Deployment.

If the bosh cloud-check command does not successfully delete disk references and you see a message similar to the following, perform the additional procedures in the Remove Unused Disks section below.

Scanning 19 persistent disks: 19 OK, 0 missing ...

Step 11: Redeploy PAS

  1. Perform the following steps to determine which stemcell is used by PAS:

    1. Navigate to the Ops Manager Installation Dashboard.
    2. Click the PAS tile.
    3. Click Stemcell and record the release number included in the displayed filename: Stemcell In the screenshot above, the stemcell release number is 3421.9.

      You can also retrieve the stemcell release using the BOSH CLI:

      $ bosh -e BOSH-DIRECTOR-IP deployments
      Using environment '10.0.0.5' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)

      Name Release(s) Stemcell(s) Team(s) Cloud Config cf-9cb6995b7d746cd77438 push-apps-manager-release/661.1.24 bosh-google-kvm-ubuntu-trusty-go_agent/3421.9 - latest ...

  2. Download the stemcell from Pivotal Network.

  3. Run the following command to upload the stemcell used by PAS:

    $ bosh -e BOSH-DIRECTOR-IP \
      -d DEPLOYMENT-NAME \
      --ca-cert PATH-TO-BOSH-SERVER-CERTIFICATE \
      upload-stemcell \
      --fix PATH-TO-STEMCELL
    

  4. If you have any other tiles installed, ensure you upload their stemcells if they are different from the PAS stemcell. Upload stemcells to the BOSH Director with bosh upload-stemcell --fix PATH-TO-STEMCELL, as in the command above.

  5. From the Ops Manager Installation Dashboard, navigate to PAS Resource Config.

  6. Ensure the number of instances for MySQL Server is set to 1.

    WARNING: Restore will fail if there is not exactly one MySQL Server instance deployed.

  7. Ensure that all errands needed by your system are set to run.

  8. Return to the Ops Manager Installation Dashboard and click Apply Changes to redeploy.

Step 12: (Optional) Restore Service Data

WARNING: BBR does not back up or restore any service data.

For this step, restore data to pre-provisioned service tiles.

The procedures for restoring service data vary. Consult the documentation for your service tile for more information.

For example, if you are using Redis for PCF v1.14, see Using BOSH Backup and Restore (BBR).

Step 13: Restore PAS

Notes:
  • The BBR PAS restore command can take at least 15 minutes to complete.
  • Pivotal recommends that you run it independently of the SSH session, so that the process can continue running even if your connection to the jumpbox fails. The command above uses nohup but you could also run the command in a screen or tmux session.
  1. If you use an external blobstore and copied it during the backup, restore the external blobstore with your IAAS specific tools before running PAS restore.

  2. Run the BBR restore command from your jumpbox to restore PAS:

    $ BOSH_CLIENT_SECRET=BOSH-PASSWORD \
      bbr deployment \
        --target BOSH-DIRECTOR-IP \
        --username BOSH-CLIENT \
        --deployment DEPLOYMENT-NAME \
        --ca-cert PATH-TO-BOSH-SERVER-CERTIFICATE \
        restore \
          --artifact-path PATH-TO-PAS-BACKUP
    

    Replace the placeholder values as follows:

    • BOSH-CLIENT, BOSH-PASSWORD: Use the BOSH UAA user provided in Pivotal Ops Manager > Credentials > Uaa Bbr Client Credentials.

      You can also retrieve the credentials using the Ops Manager API with a GET request to the following endpoint: /api/v0/deployed/director/credentials/uaa_bbr_client_credentials. For more information, see the Using the Ops Manager API topic.

    • BOSH-DIRECTOR-IP: You retrieved this value in the Step 7: Retrieve BOSH Director Address and Credentials section.
    • DEPLOYMENT-NAME: You retrieved this value in the Step 9: Identify Your Deployment section.
    • PATH-TO-BOSH-SERVER-CERTIFICATE: This is the path to the BOSH Director’s Certificate Authority (CA) certificate, if the certificate is not verifiable by the local machine’s certificate chain.
    • PATH-TO-PAS-BACKUP: This is the path to the PAS backup you want to restore.
  3. If desired, scale the MySQL Server job back up to its previous number of instances by navigating to the Resource Config section of the PAS tile. After scaling the job, return to the Ops Manager Installation Dashboard and click Apply Changes to deploy.

Step 14: (Optional) Restore On-Demand Service Instances

If you have on-demand service instances provisioned by an on-demand service broker, perform the following steps to restore them after successfully restoring PAS:

  1. Navigate to an on-demand service tile in the Installation Dashboard.

  2. Select the Errands tab.

  3. Ensure the Upgrade All Service Instances errand is On.

  4. Repeat for all on-demand service tiles.

  5. Return to the Installation Dashboard and run Apply Changes. This will include running the Upgrade All Service Instances errand for the on-demand service, which will redeploy the on-demand service instances.

  6. (Optional) Restore service data to every on-demand service instance.

  7. Any app on PAS bound to an on-demand service instance may need to be restarted to start consuming the restored on-demand service instances.

Step 15: (Optional) Restore Non-Tile Deployments

If you have any deployments that were deployed manually with the BOSH Director rather than through an Ops Manager Tile, perform the following steps to restore the VMs.

  1. Identify the names of the deployments that you need to restore. Do not include the deployments from Ops Manager Tiles. Run the following command to obtain a list of all deployments on your BOSH Director:

    $ bosh -e BOSH-DIRECTOR-IP \
      --ca-cert PATH-TO-BOSH-SERVER-CERTIFICATE \
      deployments
    

  2. Run the following command for each deployment you need to restore:

    $ bosh -n -e BOSH-DIRECTOR-IP \
      --ca-cert PATH-TO-BOSH-SERVER-CERTIFICATE \
      -d DEPLOYMENT-NAME \
      cck --resolution=recreate_vm
    

  3. Run the following command to verify the status of the VMs in each deployment:

    $ bosh -e BOSH-DIRECTOR-IP \
      --ca-cert PATH-TO-BOSH-SERVER-CERTIFICATE \
      -d DEPLOYMENT-NAME \
      vms
    
    The process state for all VMs should show as running.

After Restoring Your Backup

This section provides the steps you need to perform after restoring your PCF backup with BBR.

Step 16: Remove Unused Disks

If bosh cloud-check does not clean up all disk references, you must manually delete the disks from a previous deployment that will prevent recreated deployments from working.

WARNING: This is a very destructive operation.

To delete the disks, perform one of the following procedures:

  • Use the BOSH CLI to delete the disks by performing the following steps:

    1. Target the redeployed BOSH Director using the BOSH CLI by performing the procedures in Step 7: Retrieve BOSH Director Address and Credentials.
    2. List the deployments by running the following command:
      $ bosh -e BOSH-DIRECTOR-IP \
      --ca-cert PATH-TO-BOSH-SERVER-CERTIFICATE deployments
      
    3. Delete each deployment with the following command:
      $ bosh -d DEPLOYMENT-NAME delete-deployment
      
  • Log in to your IaaS account and delete the disks manually. Run the following command to retrieve a list of disk IDs:

    $ bosh -e BOSH-DIRECTOR-IP \
    --ca-cert PATH-TO-BOSH-SERVER-CERTIFICATE instances -i
    

Once the disks are deleted, continue with Step 10: Remove Stale Cloud IDs for All Deployments.

Create a pull request or raise an issue on the source for this page in GitHub