LATEST VERSION: v1.4 - RELEASE NOTES
Pivotal Container Service v1.3

Backing Up the Single Master Cluster

Page last updated:

This topic describes how to use BOSH Backup and Restore (BBR) to back up your single master cluster.

To perform a restore of a single master cluster with BBR, see the instructions in Restoring the Single Master Cluster.

Limitations

BBR has the following limitations:

  • BBR can only back up a single-master cluster.
  • BBR does not back up the contents of disks that are attached to nodes.
  • BBR only backs up and restores the cluster etcd data. This includes the cluster-deployed workloads. Persistent volumes and other IaaS resources, such as load balancers of workloads, are not backed up.
  • Backup and restore for clusters deployed on vSphere with NSX-T is not yet supported.
  • For a list of components that cannot be backed up with BBR, see Unsupported Components.

Prerequisites

Before you back up clusters in PKS with BBR, you must have the following:

Retrieve BOSH Director Address

Your BOSH Director may have a custom hostname. To retrieve the hostname, perform the following steps:

  1. From the Installation Dashboard in Ops Manager, click the BOSH Director tile.
  2. Select Director Config and record the value of the Director Hostname field.

If your BOSH Director does not have a hostname, perform the following steps to retrieve the IP address:

  1. From the Installation Dashboard in Ops Manager, select BOSH Director > Status.
  2. Record the IP address listed for the Director. You access the BOSH Director using this IP address.

Obtain and Record UAA Client Credentials for PKS

You must use BOSH credentials that limit the scope of BBR activity to your cluster deployments.

To obtain these credentials, perform the following steps:

  1. From the Ops Manager Installation Dashboard, click the Pivotal Container Service tile.

  2. Select the Credentials tab.

  3. Navigate to Credentials > UAA Client Credentials.

  4. Record the value for uaa_client_secret.

  5. Record the value for uaa_client_name.

Download or Locate the Root CA Certificate

If you are running BBR from a jumpbox, download the root certificate authority (CA) certificate for your PKS deployment by following the steps below.

  1. From the Ops Manager Installation Dashboard, click your username in the top right corner.

  2. Navigate to Settings > Advanced.

  3. Click Download Root CA Cert.

If you are running BBR from the Ops Manager VM, the root CA certificate is located in the following location:

/var/tempest/workspaces/default/root_ca_certificate

Verify Your Cluster Deployments

To verify that you can reach your BOSH Director and that the BOSH Director has PKS cluster deployments that can be backed up, follow the steps below.

  1. SSH into your jumpbox. For more information about the jumpbox, see Installing BOSH Backup and Restore.

  2. To perform the BBR pre-backup check, run the following command from your jumpbox:

    BOSH_CLIENT_SECRET=PKS-UAA-CLIENT-SECRET \
    bbr deployment \
    --all-deployments \
    --target BOSH-TARGET \
    --username PKS-UAA-CLIENT-NAME \
    --ca-cert PATH-TO-BOSH-SERVER-CERT \
    pre-backup-check
    

    Replace the placeholder text as follows:

    For example:

    $ BOSH_CLIENT_SECRET=p455w0rd \
    bbr deployment \
    --all-deployments \
    --target bosh.example.com \
    --username pivotal-container-service-12345abcdefghijklmn \
    --ca-cert /var/tempest/workspaces/default/root_ca_certificate \
    pre-backup-check
    

  3. If the pre-backup-check command is successful, the command returns a list of cluster deployments that can be backed up.

    For example:

    [21:51:23] Pending: service-instance_abcdeg-1234-5678-hijk-90101112131415
    [21:51:23] -------------------------
    [21:51:31] Deployment 'service-instance_abcdeg-1234-5678-hijk-90101112131415' can be backed up.
    [21:51:31] -------------------------
    [21:51:31] Successfully can be backed up: service-instance_abcdeg-1234-5678-hijk-90101112131415
    

    In the output above, service-instance_abcdeg-1234-5678-hijk-90101112131415 is the BOSH deployment name of a PKS cluster.

  4. If the pre-backup-check command fails, do one or more of the following:

    • Make sure you are using the correct Pivotal Container Service credentials.
    • Run the command again, adding the --debug flag to enable debug logs. For more information, see Exit Codes and Logging.
    • Make the changes suggested in the output and run the pre-backup check again. For example, the deployment that you selected might not have the correct backup scripts, or the connection to the BOSH Director failed.

Back up Clusters

To back up one or more cluster deployments, follow the procedures below.

Back up All Cluster Deployments

The following procedure backs up all cluster deployments.

Make sure you use the PKS UAA credentials that you recorded in Obtain and Record UAA Client Credentials for PKS. These credentials limit the scope of the backup to cluster deployments only.

  1. To back up all cluster deployments, run the following command from your jumpbox:

    BOSH_CLIENT_SECRET=PKS-UAA-CLIENT-SECRET \
    nohup bbr deployment \
    --all-deployments \
    --target BOSH-DIRECTOR-IP \
    --username PKS-UAA-CLIENT-NAME \
    --ca-cert PATH-TO-BOSH-SERVER-CERT \
    backup [--with-manifest] [--artifact-path]
    

    Replace the placeholder text as follows:

    Specify optional flags for the backup subcommand:

    • --with-manifest: This includes the manifest in the backup artifact. If you use this flag, the backup artifact then contains credentials that you should keep secret.
    • --artifact-path: This allows you to specify a path for the backup artifact.

    For example:

    $ BOSH_CLIENT_SECRET=p455w0rd \
    nohup bbr deployment \
    --all-deployments \
    --target bosh.example.com \
    --username pivotal-container-service-12345abcdefghijklmn \
    --ca-cert /var/tempest/workspaces/default/root_ca_certificate \
    backup
    

    Note: The BBR backup command can take a long time to complete. You can run it independently of the SSH session so that the process can continue running even if your connection to the jumpbox fails. The command above uses nohup, but you could also run the command in a screen or tmux session.

  2. If the backup command completes successfully, follow the steps in Manage Your Backup Artifact below.

  3. If the backup command fails, the backup operation exits. BBR does not attempt to continue backing up any non-backed up clusters. To troubleshoot a failing backup, do one or more of the following:

Back up One Cluster Deployment

  1. To backup a single, specific cluster deployment, run the following command from your jumpbox:

    BOSH_CLIENT_SECRET=PKS-UAA-CLIENT-SECRET \
    nohup bbr deployment \
    --deployment CLUSTER-DEPLOYMENT-NAME \
    --target BOSH-DIRECTOR-IP \
    --username PKS-UAA-CLIENT-NAME \
    --ca-cert PATH-TO-BOSH-SERVER-CERT \
    backup [--with-manifest] [--artifact-path]
    

    Replace the placeholder text as follows:

    Specify optional flags for the backup subcommand:

    • --with-manifest: This includes the manifest in the backup artifact. If you use this flag, the backup artifact then contains credentials that you should keep secret.
    • --artifact-path: This allows you to specify a path for the backup artifact.

    For example:

    $ BOSH_CLIENT_SECRET=p455w0rd \
    nohup bbr deployment \
    --deployment service-instance_abcdeg-1234-5678-hijk-9010111213141 \
    --target bosh.example.com \
    --username pivotal-container-service-12345abcdefghijklmn \
    --ca-cert /var/tempest/workspaces/default/root_ca_certificate \
    backup
    

  2. If the backup command completes successfully, follow the steps in Manage Your Backup Artifact below.

  3. If the backup command fails, do one or more of the following:

Recover from a Failing Command

If your backup fails, follow these steps below:

  1. Ensure that all the parameters in the command are set.

  2. Ensure that the PKS UAA credentials and root CA certificate path are valid.

  3. If you are backing up a deployment, ensure the deployment that you specify in the backup command exists.

  4. Ensure that the jumpbox can reach the BOSH Director.

  5. Consult Exit Codes and Logging.

  6. If you see the error message Directory /var/vcap/store/bbr-backup already exists on instance, run the appropriate cleanup command. See Clean Up after a Failed Backup below.

  7. If the backup artifact is corrupted, discard the failing artifacts and run the backup again.

Cancel a Backup

If you must cancel a backup, follow the steps below:

  1. Terminate the BBR process by entering Ctrl-C, then entering yes to confirm.

  2. Because stopping a backup can leave the system in an unusable state and prevent additional backups, follow the procedures in Clean Up after a Failed Backup below.

Clean Up after a Failed Backup

If you stop the backup process or the process fails, the BBR backup folder can remain on the instance, causing any subsequent attempts to backup to fail. In addition, if the interruption prevents BBR from running the post-backup scripts, the instance may be left in a locked state.

If you stop the backup process or the process fails, run the following command:

BOSH_CLIENT_SECRET=PKS-UAA-CLIENT-SECRET \
bbr deployment \
--all-deployments | --deployment CLUSTER-DEPLOYMENT-NAME \
--target BOSH-TARGET \
--username PKS-UAA-CLIENT-NAME \
--ca-cert PATH-TO-BOSH-CA-CERT \
backup-cleanup

Specify --all-deployments or --deployment depending on whether the backup you attempted previously was for all cluster deployments or a specific cluster deployment.

Replace the placeholder text as follows:

For example:

$ BOSH_CLIENT_SECRET=p455w0rd \
bbr deployment \
--all-deployments \
--target bosh.example.com \
--username pivotal-container-service-12345abcdefghijklmn \
--ca-cert /var/tempest/workspaces/default/root_ca_certificate \
backup-cleanup

If the cleanup script fails, consult the following table to match the exit codes to an error message.

Value Error
0 Success
1 General failure
8 The post-backup unlock failed. One of your deployments might be in a bad state and require attention.
16 The cleanup failed. This is a non-fatal error indicating that the utility has been unable to clean up open BOSH SSH connections to a deployment’s VMs. Manual cleanup might be required to clear any hanging BOSH users and connections.

For more information about interpreting the exit codes, see Exit Codes in BBR Exit Codes and Logging.

Manage Your Backup Artifacts

Keep your backup artifacts safe in the following ways:

  • Move the backup artifact off the jumpbox to a secure storage space. BBR stores each backup in a subdirectory named DEPLOYMENT-TIMESTAMP within the current working directory. The backup created by BBR consists of a folder that contains the backup artifacts and metadata files.

  • Compress and encrypt the backup artifacts when storing them.

  • Make redundant copies of your backup and store them in multiple locations. This can help minimize the risk of losing your backups in the event of a disaster.

  • Each time you redeploy PKS Cluster, test your backup artifact by following the procedures in Restore the Single Master Cluster.


Please send any feedback you have to pks-feedback@pivotal.io.

Create a pull request or raise an issue on the source for this page in GitHub