LATEST VERSION: v1.3 - RELEASE NOTES
Pivotal Container Service v1.3

Restoring the Single Master Cluster

Page last updated:

This topic describes how to use BOSH Backup and Restore (BBR) to perform an in-place restore of your single master cluster. By performing this restore, you can restore the single master cluster to its previous state.

To perform a back up of a single master cluster with BBR, see the instructions in Backing up the Single Master Cluster.

Warning: When you restore the single master cluster, etcd is stopped in the API server. During this process, only currently deployed clusters function, and you cannot create new workloads.

Warning: BBR only backs up and restores the cluster etcd data. This includes the cluster-deployed workloads. Persistent volumes and other IaaS resources, such as load balancers of workloads, are not backed up and restored. Backup and restore for clusters deployed on vSphere with NSX-T is not yet supported.

Compatibility of Restore

This section describes the restrictions on a backup artifact for it to be restorable to another environment. This section is for guidance only. Pivotal highly recommends that operators validate their backups by using the backup artifacts in a restore.

The restrictions for a backup artifact to be restorable are the following:

  • Topology: BBR requires the BOSH topology of a deployment to be the same in the restore environment as it was in the backup environment.
  • Naming of instance groups and jobs: For any deployment that implements the backup and restore scripts, the instance groups and jobs must have the same names.
  • Number of instance groups and jobs: For instance groups and jobs that have backup and restore scripts, the same number of instances must exist.
  • Limited validation: BBR stores the backed up data into the corresponding instance groups and jobs in the restored environment, but cannot validate the restore beyond in other ways. For example, if the MySQL encryption key is different in the restore environment, the BBR restore might succeed but the restored MySQL database would be unusable.

Note: A change in VM size or underlying hardware should not affect the ability of BBR to restore data, as long as adequate storage space to restore the data exists.

Prerequisites

Before using the result of the backup to restore to a destination environment, verify that the current environment and the destination environment are compatible. For more information, see Compatibility of Restore in Restoring the Single Master Cluster.

Before you begin restoring your PKS cluster deployment, perform the following steps:

Download the Root CA Certificate

Download the root certificate authority (CA) certificate for your PKS deployment by following the steps below.

  1. From the Ops Manager Installation Dashboard, click your username in the top right corner.

  2. Navigate to Settings > Advanced.

  3. Click Download Root CA Cert.

Record Your PKS Cluster BOSH Deployment Name

Locate and record your PKS Cluster BOSH deployment name by following the steps below.

  1. On the command line, run the following command to log in:

    pks login -a PKS-API -u USERNAME -k
    
    See Log in to the PKS CLI for more information about the pks login command.

  2. To find the cluster ID associated with the cluster you want to back up, run the following command:

    pks cluster CLUSTER-NAME
    

    Where CLUSTER-NAME is the name of your cluster. From the output of this command, record the value of UUID.

  3. From the Ops Manager Installation Dashboard, click the Director tile.

  4. Select the Credentials tab.

  5. Navigate to Bosh Commandline Credentials and click Link to Credential.

  6. Copy the credential value.

  7. SSH into your jumpbox. For more information about the jumpbox, see Installing BOSH Backup and Restore.

  8. To retrieve your PKS Cluster BOSH deployment name, run the following command:

    BOSH-CLI-CREDENTIALS deployments | grep UUID
    

    Where:

    • BOSH-CLI-CREDENTIALS is the credential value that you copied in the previous step.
    • UUID is the cluster UUID that you recorded in Record Your PKS Cluster BOSH Deployment Name.

      Your PKS Cluster BOSH deployment name begins with service-instance and includes a unique identifier.

Step 1: Recreate VMs

Before restoring the PKS Cluster, you must create the virtual machines (VMs) that constitute the deployment.

In a disaster recovery scenario, you can re-create the deployment with your PKS cluster manifest. If you used the --with-manifest flag when you ran the BBR backup command, your backup artifact includes a copy of your manifest.

Step 2: Transfer Artifacts to Jumpbox

Copy your BBR backup artifact from your safe storage location to the jumpbox. You can use any copy process that suits your needs. If your backup artifact is encrypted, you must decrypt it.

For example, run the following command to use the Secure copy protocol (SCP) to copy the backup artifact to your jumpbox:

scp LOCAL-PATH-TO-BACKUP-ARTIFACT JUMPBOX-USER/JUMPBOX-ADDRESS

Where:

  • LOCAL-PATH-TO-BACKUP-ARTIFACT is the local path to your backup artifact.
  • JUMPBOX-USER is your jumpbox username.
  • JUMPBOX-ADDRESS is the IP address of your jumpbox.

Step 3: Restore the Cluster

Perform the steps below to restore a PKS Cluster.

Note: The BBR restore command can take a long time to complete. You can run it independently of an SSH session so that the process can continue running even if your connection to the jumpbox fails. The example command below uses nohup, but you can instead run the command in a screen or tmux session.

  1. Move the PKS Cluster backup artifact to a folder from which you will run the BBR restore process.

  2. SSH into your jumpbox. For more information about the jumpbox, see Install BOSH Backup and Restore.

  3. Run the following command:

    BOSH_CLIENT_SECRET=BOSH-CLIENT-SECRET \
    nohup bbr deployment \
    --target BOSH-TARGET \
    --username BOSH-CLIENT \
    --deployment DEPLOYMENT-NAME \
    --ca-cert PATH-TO-BOSH-SERVER-CERT \
    restore \
    --artifact-path PATH-TO-DEPLOYMENT-BACKUP
    

    Replace the placeholder text as follows:

    • BOSH-CLIENT-SECRET: In the BOSH Director tile, navigate to Credentials > Bosh Commandline Credentials. Record the value for BOSH_CLIENT_SECRET.
    • BOSH-TARGET: In the BOSH Director tile, navigate to Credentials > Bosh Commandline Credentials. Record the value for BOSH_ENVIRONMENT. You must be able to reach the target address from the workstation where you run bbr commands.
    • BOSH-CLIENT: In the BOSH Director tile, navigate to Credentials > Bosh Commandline Credentials. Record the value for BOSH_CLIENT.
    • DEPLOYMENT-NAME: Use the PKS Cluster BOSH deployment name that you recorded in the Prerequisites section.
    • PATH-TO-BOSH-CA-CERT: Use the path to the root CA certificate that you downloaded in the Prerequisites section.
    • PATH-TO-DEPLOYMENT-BACKUP: Use the path to to your deployment backup.

    For example:

    $ BOSH_CLIENT_SECRET=p455w0rd \
    nohup bbr deployment \
    --target bosh.example.com \
    --username admin \
    --deployment cf-acceptance-0 \
    --ca-cert bosh.ca.cert \
    restore \
    --artifact-path deployment-backup
    

  4. If the restore command fails, do one or more of the following:

Recover from a Failing Command

If your backup fails, follow these steps below:

  1. Ensure that all the parameters in the command are set.

  2. Ensure that the BOSH Director credentials are valid.

  3. Ensure the specified BOSH deployment exists.

  4. Ensure that the jumpbox can reach the BOSH Director.

  5. Consult Exit Codes and Logging.

  6. If you see the error message Directory /var/vcap/store/bbr-backup already exists on instance, run the appropriate cleanup command. See Clean Up after a Failed Backup below.

Cancel a Restore

If you must cancel a backup, follow the steps below:

  1. Terminate the BBR process by entering Ctrl-C, then entering yes to confirm.

  2. Because stopping a backup can leave the system in an unusable state and prevent future backups, follow the procedures in Clean Up after a Failed Backup below.

Clean Up after a Failed Restore

If you stop the restore process or the restore process fails, then the process may leave the BBR restore folder on the instance. As a result, any subsequent attempts to restore may also fail. In addition, if the interruption prevents BBR from running the post-restore scripts, the instance may be left in a locked state.

If you stop the backup process or the process fails, run the following command:

BOSH_CLIENT_SECRET=BOSH-CLIENT-SECRET \
bbr deployment \
--target BOSH-TARGET \
--username BOSH-CLIENT \
--deployment DEPLOYMENT-NAME \ 
--ca-cert PATH-TO-BOSH-CA-CERT \ 
restore-cleanup

Replace the placeholder text as follows:

  • BOSH-CLIENT-SECRET: In the BOSH Director tile, navigate to Credentials > Bosh Commandline Credentials. Record the value for BOSH_CLIENT_SECRET.
  • BOSH-TARGET: In the BOSH Director tile, navigate to Credentials > Bosh Commandline Credentials. Record the value for BOSH_ENVIRONMENT. You must be able to reach the target address from the workstation where you run bbr commands.
  • BOSH-CLIENT: In the BOSH Director tile, navigate to Credentials > Bosh Commandline Credentials. Record the value for BOSH_CLIENT.
  • DEPLOYMENT-NAME: Use the PKS Cluster BOSH deployment name that you recorded in the Prerequisites section.
  • PATH-TO-BOSH-CA-CERT: Use the path to the root CA certificate that you downloaded in the Prerequisites section.

For example:

$ BOSH_CLIENT_SECRET=p455w0rd \
bbr deployment \
--target bosh.example.com \
--username admin \
--deployment cf-acceptance-0 \
--ca-cert bosh.ca.crt \
restore-cleanup

If the cleanup script fails, consult the following table to match the exit codes to an error message.

Value Error
0 Success
1 General failure
8 The post-restore unlock failed. Your deployment may be in a bad state and require attention.
16 The cleanup failed. This is a non-fatal error indicating that the utility has been unable to clean up open BOSH SSH connections to the deployment VMs. Manual cleanup may be required to clear any hanging BOSH users and connections.

For more information about interpreting the exit codes, see Exit Codes in BBR Exit Codes and Logging.


Please send any feedback you have to pks-feedback@pivotal.io.

Create a pull request or raise an issue on the source for this page in GitHub