LATEST VERSION: v1.1 - RELEASE NOTES
Pivotal Container Service v1.1

Restore the PKS Control Plane

Page last updated:

This topic describes how to use BOSH Backup and Restore (BBR) to restore a PKS deployment.

To back up a PKS deployment with BBR, see the Back Up the PKS Control Plane topic.

The steps in this topic allow you to restore a PKS deployment.

Compatibility of Restore

This section describes the restrictions for a backup artifact to be restorable to another environment. This section is for guidance only, and Pivotal highly recommends that operators validate their backups by using the backup artifacts in a restore.

The restrictions for a backup artifact to be restorable are the following:

  • Topology: BBR requires the BOSH topology of a deployment to be the same in the restore environment as it was in the backup environment.
  • Naming of instance groups and jobs: For any deployment that implements the backup and restore scripts, the instance groups and jobs must have the same names.
  • Number of instance groups and jobs: For instance groups and jobs that have backup and restore scripts, there must be the same number of instances.
  • Limited validation: BBR puts the backed up data into the corresponding instance groups and jobs in the restored environment, but cannot validate the restore beyond that. For example, if the MySQL encryption key is different in the restore environment, the BBR restore might succeed although the restored MySQL database is unusable.

Note: A change in VM size or underlying hardware should not affect BBR’s ability to restore data, as long as adequate storage space to restore the data exists.

Step 1: Recreate VMs

Before restoring a PKS deployment, you must create the VMs that constitute the deployment.

In a disaster recovery scenario, you can re-create the deployment with your PKS deployment manifest. If you used the --with-manifest flag when running the BBR backup command, your backup artifact includes a copy of your manifest.

Step 2: Transfer Artifacts to Jumpbox

Move your BBR backup artifact from your safe storage location to the jumpbox.

For instance, you could SCP the backup artifact to your jumpbox:

$ scp LOCAL_PATH_TO_BACKUP_ARTIFACT JUMPBOX_USER/JUMPBOX_ADDRESS

If it is encrypted, decrypt it.

Step 3: Restore

Note: The BBR restore command can take a long time to complete. You can run it independently of the SSH session so that the process can continue running even if your connection to the jumpbox fails. The command above uses nohup, but you could also run the command in a screen or tmux session.

Use the optional --debug flag to enable debug logs. See the Exit Codes and Logging topic for more information.

Perform the following steps to restore a PKS deployment:

  1. Ensure the PKS deployment backup artifact is in the folder you will run BBR from.

  2. Download the root CA certificate for your PKS deployment:

    1. From the Ops Manager Installation Dashboard, click your username in the top right corner.
    2. Navigate to Settings > Advanced.
    3. Click Download Root CA Cert.
  3. Locate your PKS BOSH deployment name:

    1. From the Ops Manager Installation Dashboard, click the Director tile.
    2. Click the Credentials tab.
    3. Navigate to Bosh Commandline Credentials and click Link to Credential.
    4. Copy the credential value.
    5. From the command line, run the following command to retrieve your PKS BOSH deployment name, replacing BOSH-CLI-CREDENTIALS with the credential value you copied in the previous step:
      BOSH-CLI-CREDENTIALS deployments | grep pivotal-container-service
      Your PKS BOSH deployment name begins with pivotal-container-service and includes a unique identifier.
  4. Run the BBR restore:

    $ BOSH_CLIENT_SECRET=BOSH_CLIENT_SECRET \
      nohup bbr deployment \
      --target BOSH_TARGET \
      --username BOSH_CLIENT \
      --deployment DEPLOYMENT_NAME \
      --ca-cert PATH_TO_BOSH_SERVER_CERT \
      restore \
      --artifact-path PATH_TO_DEPLOYMENT_BACKUP
    
    Replace the placeholder values as follows:
    Credential Location
    BOSH-CLIENT-SECRET In the BOSH Director tile, navigate to Credentials > Bosh Commandline Credentials. Record the value for BOSH_CLIENT_SECRET.
    BOSH-TARGET In the BOSH Director tile, navigate to Credentials > Bosh Commandline Credentials. Record the value for BOSH_ENVIRONMENT. You must be able to reach the target address from the workstation where you run bbr commands.
    BOSH-CLIENT In the BOSH Director tile, navigate to Credentials > Bosh Commandline Credentials. Record the value for BOSH_CLIENT.
    PATH-TO-BOSH-CA-CERT Use the path to the root CA certificate you downloaded in a previous step.
    DEPLOYMENT-NAME Use the PKS BOSH deployment name you located in a previous step.

If the command fails, try the steps in Recovering from a Failing Command.

Recovering from a Failing Command

  1. Ensure all the parameters in the command are set.
  2. Ensure the BOSH Director credentials are valid.
  3. Ensure the specified BOSH deployment exists.
  4. Ensure that the jumpbox can reach the BOSH Director.
  5. Ensure the source BOSH deployment is compatible with the target BOSH deployment.
  6. If you see the error message Directory /var/vcap/store/bbr-backup already exists on instance, run the relevant commands from the Clean Up After Failed Restore section of this topic.
  7. See the Exit Codes and Logging topic.

Cancel a Restore

If you need to cancel a restore, perform the following steps:

  1. Terminate the BBR process by pressing Ctrl-C and typing yes to confirm.
  2. Stopping a restore can leave the system in an unusable state and prevent future restores. Perform the procedures in the Clean Up After Failed Restore section to enable future restores.

Clean Up After Failed Restore

If your restore process fails, then the process may leave the BBR restore folder on the instance. As a result, any subsequent restore attempts may also fail. In addition, BBR may not have run the post-restore scripts, which can leave the instance in a locked state.

In order to resolve these issues, run the BBR cleanup script.

To clean up after a failed restore, run the following command:

$ BOSH_CLIENT_SECRET=BOSH_CLIENT_SECRET \
    bbr deployment \
    --target BOSH_TARGET \
    --username BOSH_CLIENT \
    --deployment DEPLOYMENT_NAME \
    --ca-cert PATH_TO_BOSH_CA_CERT \
    restore-cleanup

If the cleanup script fails, consult the following table to match the exit codes to an error message.

Value Error
0 Success
1 General failure
8 The post-restore unlock failed. Your deployment may be in a bad state and require attention.
16 The cleanup failed. This is a non-fatal error indicating that the utility has been unable to clean up open BOSH SSH connections to the deployment VMs. Manual cleanup may be required to clear any hanging BOSH users and connections.

For more information about how to interpret the exit code, see the Exit Codes section of the Exit Codes and Logging topic.


Please send any feedback you have to pks-feedback@pivotal.io.

Create a pull request or raise an issue on the source for this page in GitHub