Pivotal Healthwatch v1.7 Release Notes

v1.7.0

Release Date: September 20, 2019

Features

New features and changes in this release:

  • Remove the default critical and warning threshold for alerts we have learned are highly dependent upon customer environments.

    • For customers doing a fresh install:
    • They will not receive alerts for metrics with highly variable thresholds, designated by the Environment Specific Alert table. Customer who wants to receive alerts for the metrics with dynamic thresholds need to configure the alert threshold through HAPI explicitly.
    • For customers upgrading:
    • If they have custom alert thresholds configured through HAPI for the affected metrics, the alert behavior will not be affected by this change. If customers choose to forego their custom thresholds and no longer monitor these metrics, instructions are provided here.
    • If they do not have custom alert thresholds configured, they will no longer receive alerts for the affected metrics. Current in-flight red/yellow alerts will be cleared by green alerts regardless the current metric value.
  • Remove metric, graph, and alert associated to Route Registration Messages Delta. This metric was removed in PAS 2.4 so related graphs and alerts should not display. The current associated alert will be resolved automatically.

  • [Bug Fix] Correctly handle rotation of root Certificate Authorities.

  • [Bug Fix] Reduce noisiness of system.healthy alerts when a BOSH VM is created or deleted.

  • [Bug Fix] If healthwatch-ingestor fails to receive data after 15 seconds, it will automatically reset its Spring Application Context to re-establish a Firehose connection. After 20 resets of the Spring Application Context, the app instance will purposely crash and let Diego re-schedule it, providing a fresh container and JVM instance.

  • [Bug Fix] Fix healthwatch-ingestor crash in cases where GoRouter receives an HTTP request with non-standard HTTP method, resulting in a HttpStartStop metric with a null HTTP method value.

  • [Bug Fix] Setting Redis Worker Count in the Healthwatch Component Config page of Ops Manager successfully changes instance number. Previously, changes to this field were not reflected in the Healthwatch deployment.

  • [Bug Fix] Delete orphaned cf-health-check smoke-test-app instances regularly. Previously, cf-health-check would occasionally fail to delete a smoke test and never cleaned it up.

  • [Bug Fix] Fix occasional inaccurate spikes in Log Transport Throughput graph.

  • Maintenance update of the following dependencies:

    • Spring Boot now 2.1.8

Known Issues

This release has the following known issues.

Disk Slowly Fills When Using vSAN with Healthwatch Leads

The vSAN object count increases on vSphere versions earlier than v6.5 update 2.

Healthwatch deploys the app bosh-health-check, which deploys and deletes a VM every 10 minutes. vSphere versions earlier than v6.5 update 2 leave a namespace or folder and subfolders when the VM is deleted. The orphaned folders cause the vSAN object count to increase. This is a known issue for vSAN. For more information about the vSAN known issue, see Deleted VMs leave components behind in GitHub.

To address the issue, update vSphere to v6.5 update 2 or later. Or, you can stop the bosh-health-check to slow down the increase in vSAN object count.

Indicator Protocol Beta Dashboard Displays Error Due to Log Cache

Occasionally, the Indicator Protocol Beta Dashboard charts will fail to load with the error: "Error fetching graph data.".

These charts are populated using Log Cache, which is part of Loggregator and will fail periodically due to Log Cache timing out while attempting to process the data.

No corrective action is required and it will self-resolve if possible.

Multiple healthwatch_space_developer CF on Healthwatch re-install

In Healthwatch v1.7.0, when the Healthwatch tile is re-installed, the push-apps errand creates a duplicate healthwatch_space_developer user because the pre-existing user is not deleted during the previous tile’s deletion.

This causes the cf-health-check to fail due to an invalid password for the healthwatch_space_developer user.

Infinite Login Redirect When Using Private Domain Suffixes

In Healthwatch v1.7.0, certain private domain suffixes, such as .local or .a, result in an infinite redirect loop when you try to access the Healthwatch UI.

A workaround is to set the SKIP_CERT_VERIFY environment variable to true on the Healthwatch app.

For the canonical list of public suffixes, see the Public Suffix List.