Backing Up and Restoring Service Instances

Both disaster recovery and system validation depend upon backups. A PCC service instance backup consists of the configuration for a PCC service instance together with all region data that has been persisted to disk. A PCC service instance backup may be used to restore the service on a new, but unconfigured PCC service instance.

Before You Begin

WARNING: Consider each of these important items when preparing to backup and restore a PCC service instance. Since recovery depends on a correct implemention, address each item during planning.

  • The backup consists of data from regions that have been persisted to disk. All non-persistent region data will be lost. A persistent region is one that writes its region entries to disk.
  • The backup and restoration process does not restore WAN replication configuration. A PCC service instance that replicates data via WAN may be backed up. A restore of that PCC service instance will contain all the persistent data. That PCC service instance will be able to communicate with other WAN-connected service instances after further configuration.
  • The disk size on the jumpbox VM must be large enough to hold the backup artifacts. The quantity of disk space needs to exceed the sum total of all disk space used within the PCC service instance to be backed up. An approximation of that quantity of disk space may be calculated by looking at the fully populated PCC service instance. Sum the disk space (in /var/vcap/store) used by each locator and server.
  • The PCC service instance needs to be quiescent when the backup is taken. An active cluster could have disk writes in progress, leading to a backup for which region data may or may not have been written to disk.
  • The fresh PCC service instance that is to be used for a restore must have same quantity of locators and servers as the PCC service instance had when its backup was taken. The fresh PCC service instance must have sufficient resources to hold the data that is in the backup.
  • Do not configure the fresh PCC service instance that is to be used for a restore after creation. It must be empty. Do not create any regions or start any gateway senders. The restore will create all regions, both persistent and not persistent. The non-persistent regions will be empty.
  • Service keys are not part of the backup. Service keys will need to be created anew on a restored service instance.

Backing Up a PCC Service Instance

  1. Optional: Back up the Pivotal Cloud Foundry environment. For instructions, see Backing Up Pivotal Cloud Foundry with BBR.

  2. Optional: Compact the disk stores of the cluster to be backed up. Use the gfsh compact disk-store command after connecting to the cluster with a role that is authorized with the CLUSTER:MANAGE operation permission. See the documentation for gfsh compact disk-store.

  3. Optional: Acquire a region statistic that may be used for a minimal validation upon using this backup for subsequent restoration. Use the gfsh describe region command on each persistent region, after connecting to the cluster with a role that is authorized with the CLUSTER:READ operation permission.

    describe region --name=REGION-NAME
    

    For each persistent region (the data-policy will contain the string PERSISTENT), record the size of region. This size is the quantity of entries in the region. Assuming that activity on the region is quiescent, when this backup is used in a restoration, the quantity of region entries in the restored region forms a minimal validation of the region.

  4. Ssh to the jumpbox. Assuming that the Ops Manager VM is being used as the jumpbox, follow directions at Log in to the Ops Manager VM with SSH.

  5. Determine all needed credentials and parameters for the command that makes the backup:

    • BOSH-DIRECTOR-IP: Obtain the BOSH Director IP address by following the instructions at Step 4:Retrieve BOSH Director Address.
    • SERVICE-INSTANCE-DEPLOYMENT-NAME: Obtain the deployment name by following the instructions at Acquire the Deployment Name.
    • BOSH-CLIENT, BOSH-CLIENT-PASSWORD, and PATH-TO-BOSH-SERVER-CERTIFICATE: To learn all three of these values, open the Credentials tab of the BOSH Director tile in Ops Manager. Find and open Bosh Commandline Credentials. The path is the path to the root Certificate Authority (CA) certificate, and its value will be /var/tempest/workspaces/default/root_ca_certificate if Ops Manager and the jumpbox are on the same VM.
  6. Make the backup. From the jumpbox, issue this command, substituting values acquired in the previous step:

    BOSH_CLIENT_SECRET=BOSH-CLIENT-PASSWORD \
    bbr deployment \
    --target BOSH-DIRECTOR-IP \
    --username BOSH-CLIENT \
    --ca-cert PATH-TO-BOSH-SERVER-CERTIFICATE \
    --deployment SERVICE-INSTANCE-DEPLOYMENT-NAME \
    backup
    

    BOSH_CLIENT_SECRET is an environment variable that is set only for the duration of the command.

    The backup will be within a directory on the jumpbox named by the depoyment name and a timestamp.

  7. Copy the backup to a permanent home for archiving. Pivotal recommends compressing and encrypting the files. Disaster recovery plans often recommend archiving multiple copies of each backup.

Restoring a PCC Service Instance

  1. Use SSH to connect to the jumpbox. Assuming that the Ops Manager VM is being used as the jumpbox, follow directions at Log in to the Ops Manager VM with SSH.

  2. Transfer the archived backup to the jumpbox. Expand if compressed, and decrypt if encrypted.

  3. Make a new, but empty service instance. See directions in Creating a Pivotal Cloud Cache Service Instance. If the service instance to be restored was connected to a WAN, be sure to create the new instance with the same distributed_system_id as the old one.

  4. Determine all needed credentials and parameters for the command that does the restore by following step 5 instructions from making the backup.

  5. Do the restore. From the jumpbox, issue this command, substituting values acquired in the previous step:

    $ BOSH_CLIENT_SECRET=BOSH-CLIENT-PASSWORD \
    bbr deployment \
    --target BOSH-DIRECTOR-IP \
    --username BOSH-CLIENT \
    --ca-cert PATH-TO-BOSH-SERVER-CERTIFICATE \
    --deployment SERVICE-INSTANCE-DEPLOYMENT-NAME \
    restore --artifact-path PATH-TO-SERVICE-INSTANCE-ARTIFACT
    

    PATH-TO-SERVICE-INSTANCE-ARTIFACT is the path to the artifact for the instance that you are currently restoring.

  6. Create a new service key, as the service key was not an artifact that the backup created. For a stand-alone service instance (one that is not part of a WAN), follow the directions at Create a Service Key. For a WAN-connected service instance, see Restoring a WAN Connection.

  7. If the region size was captured prior to making the backup, in order to do a minimal validation of the restored cluster, use the gfsh describe region command on each persistent region, after connecting to the cluster with a role that is authorized with the CLUSTER:READ operation permission.

    describe region --name=REGION-NAME
    

    For each persistent region, the size of the restored region should be the same as the value captured when the backup was made.

  8. Rebind or restage all apps.