Checking Pivotal Platform State after a Power Failure on vSphere
- Overview
- Checklist
- Phase 1: Ensure vSphere is Running
- Phase 2: Ensure Ops Manager is Running
- Phase 3: Ensure BOSH Director is Running
- Phase 4: Ensure BOSH Resurrector Finished Recovering
- Phase 5: Ensure PAS VMs are Running
- Phase 7: Ensure Apps Hosted on PAS are Running
- Phase 8: Check the Healthwatch Dashboard
Page last updated:
Warning: Pivotal Application Service (PAS) v2.8 is no longer supported because it has reached the End of General Support (EOGS) phase as defined by the Support Lifecycle Policy. To stay up to date with the latest software and security updates, upgrade to a supported version.
This topic describes how to check Pivotal Platform state after a power failure in an on-premises vSphere installation.
If you have a procedure at your company for handling power failure scenarios and would to like add steps for checking that Pivotal Platform is in a good state, you can use this procedure as a template.
Overview
This section describes the process used by Pivotal Platform to recover from power failures and exceptions to that process.
Automatic Recovery Process
When power returns after a failure, vSphere and Pivotal Platform automatically do the following to recover your environment:
- vSphere High Availability (HA) recovers VMs.
- BOSH ensures the processes on those VMs are healthy, with the exception of the Pivotal Operations Manager VM and the BOSH VM itself. Pivotal Platform uses BOSH to deploy and manage its VMs. For more information, see BOSH.
- The Diego runtime of Pivotal Application Service (PAS) recovers apps that were running on the VMs. For more information, see Diego.
Scenarios that Require Manual Intervention
There are two scenarios that can require manual intervention when recovering your environment after a power failure:
- If PAS is configured to use a MySQL cluster instead of a single node, the cluster does not recover automatically.
- If you have Ops Manager v2.5.3 or earlier and encounter the following known issue in the BOSH Director: Monit inaccurately reports the health of UAA.
The procedure in this topic includes more detail about addressing these scenarios.
Checklist
Use the checklist in this section to ensure Pivotal Platform is in a good state after a power failure. It includes links to sections that contain more detail about each phase.
This checklist assumes your Pivotal Platform on vSphere installation is set up for vSphere HA and you have the BOSH Resurrector enabled.
Phase | Component | Action |
---|---|---|
1 | vSphere | Ensure vSphere is Running |
2 | Ops Manager | Ensure Ops Manager is Running |
3 | BOSH Director | Ensure BOSH Director is Running |
4 | BOSH Director | Ensure BOSH Resurrector Finished Recovering |
5 | PAS | Ensure PAS VMs are Running (This may include manually recovering the MySQL cluster) |
6 | PAS | Ensure Apps Hosted on PAS are Running |
7 | Pivotal Platform Healthwatch | Check the Healthwatch Dashboard |
Phase 1: Ensure vSphere is Running
Ensure that vSphere is running and has fully recovered from the power failure. Check your internal vSphere monitoring dashboard.
Phase 2: Ensure Ops Manager is Running
To ensure Ops Manager is running, do the following:
Open vCenter and navigate to the resource pool that hosts your Pivotal Platform deployment.
Select the Related Objects, and then Virtual Machines.
Locate the VM with the name
OpsMan-VERSION
, such asOpsMan-2.6
.Review the State and Status columns for the Ops Manager VM. If Ops Manager is running, they say Powered On and Normal. If this is not the case, restart the VM.
Phase 3: Ensure BOSH Director is Running
To ensure BOSH Director is running, do the following:
In a browser, navigate to the Ops Manager UI and select the BOSH Director for vSphere tile.
Note: If you do not know the URL of the Ops Manager VM, you can use the IP address from vCenter.
Select Status.
In the BOSH Director row, record the CID. The CID is the cloud ID and corresponds to the VM name in vSphere.
Navigate to the vCenter resource pool or cluster that hosts your Pivotal Platform deployment.
Select Related Objects, and then Virtual Machines.
Locate the VM with the name that corresponds to the CID value you copied.
Review the State and Status columns for the VM. If the State is not Powered On, restart the VM.
If the VM is Powered On but Status does not display Normal, it may be due the following known issue: Monit inaccurately reports the health of UAA. To resolve this issue, do the following:
- SSH into the BOSH Director VM using the instructions in SSH into the BOSH Director VM.
Run the following command to see that all processes are running:
monit summary
If the
uaa
process is not running, run the following command:monit restart UAA
Phase 4: Ensure BOSH Resurrector Finished Recovering
If enabled, the BOSH Resurrector re-creates any VMs in a problematic state after being recovered by vSphere HA.
To ensure BOSH Resurrector finished recovering, do the following:
Log in to the Ops Manager VM with SSH using the instructions in Log in to the Ops Manager VM with SSH.
Authenticate with the BOSH Director VM using the instructions in Authenticate with the BOSH Director VM.
Run the following command to see if there is any currently running or queued Resurrector activity:
bosh tasks --all -d ''
Look for
scan
andfix
in the task description. If there are no tasks running, it is likely that BOSH Director has finished recovering. You can also runbosh tasks --recent --all -d ''
to view finished tasks.
Phase 5: Ensure PAS VMs are Running
Note: You can also apply the steps in this section to any Pivotal Platform services. To further ensure the health of Pivotal Platform services, use the Pivotal Platform Healthwatch dashboard and the documentation for each service.
To ensure PAS VMs are running, do the following:
Run the following command to confirm that VMs are running:
bosh vms
BOSH lists VMs by deployment. The deployment with the
cf-
prefix is the PAS deployment.If the
mysql
VM is not running, it is likely because it is a cluster and not a single node. Clusters require manual intervention after an outage. See Manually Recover PAS MySQL (Clusters Only) to confirm and recover the cluster.If any other VMs are not running, run the following command:
bosh cck -d DEPLOYMENT
This command scans for problems and provides options for recovering VMs. For more information, see IaaS Reconciliation in the BOSH documentation.
If you cannot get all VMs running, contact Pivotal Support for assistance. Provide the following information:
- You have started this checklist to recover from a power failure on vSphere
- A list of failing VMs
- Your Pivotal Platform version
Manually Recover PAS MySQL (Clusters Only)
To manually recover PAS MySQL, do the following:
In a browser, navigate to the Ops Manager UI and select the Pivotal Application Service tile.
Select the Resource Config pane.
Review the INSTANCES column of the MySQL Server job. If the number of instances is greater than
1
, manually recover MySQL by following this procedure: Recovering From MySQL Cluster Downtime.
Phase 7: Ensure Apps Hosted on PAS are Running
To ensure apps hosted on PAS are running, do the following:
Check the status of an app your company runs on Pivotal Platform. Run any healthchecks that the app has or visit the URL of the app to see that it is working.
Push an app to Pivotal Platform.
Phase 8: Check the Healthwatch Dashboard
You can use Pivotal Platform Healthwatch to further assess the state of Pivotal Platform. For more information, see Using Pivotal Platform Healthwatch.