Troubleshooting BBR

Page last updated:

This topic lists common troubleshooting scenarios and their solutions when using BOSH Backup and Restore (BBR) to back up and restore Pivotal Cloud Foundry (PCF).

Troubleshooting During a Restore

This section provides solutions for errors that occur during a restore.

Restore Fails with a MySQL Monit Start Timeout

Symptom

While running the BBR restore command, restoring the job mysql-restore fails with the following error:

1 error occurred:

* restore script for job mysql-restore failed on mysql/0.
...
Monit start failed: Timed out waiting for monit: 2m0s

Explanation

This happens when the MySQL job fails to start within the timeout period. It ends up in an Execution Failed state and monit never tries to start it again.

Solution

Ensure that your MySQL Server cluster has only one instance. If there is more than one instances of MySQL Server, the restore fails with a monit start timeout. Scale down to one instance and retry.

If your MySQL Server cluster is already scaled down to one node, it may have taken longer than normal to restart the cluster. Follow the procedure below to manually verify and retry.

  1. List the VMs in your deployment:

    bosh -e BOSH-DIRECTOR-IP --ca-cert /var/tempest/workspaces/default/root_ca_certificate \
    -d DEPLOYMENT-NAME \
    ssh
    

    Where:

  2. Select the mysql VM to SSH into.

  3. From the mysql VM, run the following to check that the MySQL job process is running:

    ps aux | grep MYSQL-JOB
    


    Where MYSQL-JOB is your MySQL job process name. Replace with one of the following:

    • If you selected Internal Databases - MySQL - Percona XtraDB Cluster when you configured the PAS tile, replace MYSQL-JOB with galera-init.
    • If you selected Internal Databases - MySQL - MariaDB Galera Cluster when you configured the PAS tile, replace MYSQL-JOB with mariadb_ctrl.

    For example:

    $ ps aux | grep galera-init
    
    For more information, see the Deploying PAS topic for your IaaS. For example, if you run PAS on AWS, see Deploying PAS on AWS.

  4. Run the following command to check that monit reports that the MySQL job process is in an Execution Failed state:

    sudo monit summary
    
  5. If so, run the following command from the mysql VM to disable monitoring:

    monit unmonitor
    
  6. Run the following command to enable monitoring:

    monit monitor
    
  7. After a few minutes, run the following command: monit summary The command should report that all the processes are running.

  8. Re-attempt the restore with BBR.

Deployment Does Not Match the Structure of the Backup

Symptom

The following error displays:

Deployment 'deployment-name' does not match the structure of the provided backup

Explanation

The instance groups with the restore scripts in the destination environment don’t match the backup metadata. For example, they may have the wrong number of instances of a particular instance group, or the metadata names an instance group that doesn’t exist in the destination environment.

Solution

BBR only supports restoring to an environment that matches your original environment. Pivotal recommends altering the destination environment to match the structure of the backup.

General Troubleshooting

This section provides solutions for general errors.

Connection Error

Symptom

BBR displays an error message containing “SSH Dial Error” or another connection error.

Explanation

The jumpbox and the VMs in the deployment are experiencing connection problems.

Solution

Perform the following steps:

  1. Ensure your deployment is healthy by running bosh vms.
  2. Run bbr deployment backup-cleanup in order to clean up the data from the failed backup on the instances. Otherwise, further BBR commands fail.
  3. Repeat the BBR operation.

Error Running Metadata Script

Symptom

BBR backup or restore fails with a metadata error:

1 error occurred:
error 1:
An error occurred while running metadata script for job redis-server on redis/0ce9f81f-1756-480b-8e3e-a4609b14b6a6: error from metadata

Explanation

There is a problem with your PCF install.

Solution

Contact Pivotal Support

Create a pull request or raise an issue on the source for this page in GitHub