LATEST VERSION: 1.2 - CHANGELOG
PCF Healthwatch v1.1

Troubleshooting PCF Healthwatch

Page last updated:

This topic describes how to resolve common issues with Pivotal Cloud Foundry (PCF) Healthwatch.

Insufficient Memory Resources

Insufficient capacity of the Diego cells can cause issues when you install or upgrade PCF Healthwatch.

Error

The push-apps errand can fail if Diego cells do not have sufficient free memory to place the PCF Healthwatch applications. If this occurs, you will see an error message like the following:

$ /var/vcap/packages/cf-cli/bin/cf start healthwatch-blue  
Starting app healthwatch-blue in org system / space healthwatch as admin...  
FAILED  
InsufficientResources

Cause

Diego cells do not have enough available resources to place the PCF Healthwatch applications.

Solution

To resolve this issue, navigate to the Resource Config pane of the PAS or SRT tile and increase the number of Diego Cell instances. Or, if you do not need high-availability, scale down the number of instances in Healthwatch Component Config in the PCF Healthwatch Tile.

Memory Limit Errors

Insufficient memory allocation can cause issues when you install or upgrade PCF Healthwatch.

Error

If a PCF environment exceeds the total memory limit set for the healthwatch space in the system org, the PCF Healthwatch push-apps errand can fail. When this occurs, the error message looks similar to the following:

$ /var/vcap/packages/cf-cli/bin/cf start cf-health-check
Starting app cf-health-check in org system / space healthwatch as admin...
FAILED
Server error, status code: 400, error code: 100005, message: You have exceeded your organization's memory limit: app requested more memory than available

Cause

Your PCF environment has an insufficient total memory quota set for the healthwatch space in the system org.

The issue should not occur if the Apps Manager errand has run in your environment. Because service tiles use the system org to execute smoke tests, the Apps Manager errand sets the default system org quota to runaway. If the Apps Manager errand has not run or failed, the default system quota may not be reset properly.

Solution

To resolve this issue, you can set the default memory quota for the healthwatch space in the system org to at least 24 GB and re-run the push-apps errand manually.

Ops Manager Health Check Errors

The Opsmanager Health Check needs the ability to reach Ops Manager on the underlying network.

Error

This error will appear as constantly failing Ops Manager Health Checks on the Dashboard and Ops Manager Health Check History page even though Ops Manager is running.

Cause

The opsmanager-health-check application attempts to connect to Ops Manager in order to verify it is running. This application needs the correct network settings in order to be able to reach the Ops Manager VM. If there are firewall rules in place that prevent the network access, then this check will continually fail.

Solution

To resolve this issue, confirm that the opsmanager-health-check application is attempting to reach the Ops Manager VM on a URL that is accessible from that instance. To verify this, run cf ssh opsmanager-health-check to SSH into the running instance. Then run curl -v $OPSMANAGER_URL to check the network access.

If you cannot modify network access to allow the opsmanager-health-check application to reach Ops Manager, this test cannot be executed properly and you should disable it. See Disable Ops Manager Continuous Validation Testing.

BOSH Health Check Failing After Upgrade

Error

See the following error messages:

  • ERROR: Bosh health check failed to delete deployment “bosh-health-check”: Deployment not found

  • bosh-health-check deployment does exist.

Cause

For PCF Healthwatch v1.1.8 and later, BOSH Health Check is using the service broker UAA credentials. This can cause a permissions issue if the BOSH Health Check deployment already exists on the BOSH Director.

Solution

To resolve this issue, manually delete the existing BOSH Health Check deployment.

Smoke Tests Failing on BOSH Metric Ingestion

Error

ERROR: Smoke Tests errand fails with[Fail] Bosh metric ingestion [It] Ingests metrics from the director into mysql /var/vcap/packages/healthwatch-data/src/github.com/pivotal-cf/healthwatch-data/data-ingestion/smoketests/bosh_metrics_test.go:50 `

Cause

The PCF Healthwatch Smoke Tests errand validates that BOSH health metrics are being stored in the PCF Healthwatch database. This same error also manifests itself as a complete lack of data in the Job Health and Job Vitals panels on the PCF Healthwatch dashboard. When this smoke test fails, the cause is most often the result of a failure in the BOSH System Metrics Forwarder component. There is a bug in Ops Manager versions prior to v2.0.13 and v2.1.4 that can cause this BOSH Director-based process to error out.

Solution

To resolve this issue, update Ops Manager to v2.0.13 or v2.1.4 or later. Also validate that there are BOSH health metrics in the Firehose by running cf nozzle -n | grep system. You should see metrics like system.healthy and system.cpu.user roughly every 30 seconds.

Create a pull request or raise an issue on the source for this page in GitHub