PCF Healthwatch v1.4

PCF Healthwatch v1.4 Release Notes



Release Date: February 19, 2019

  • [Feature] Healthwatch specifies the correct buildpack for the CF smoke test to reduce download times and avoid timeouts during installation.
  • [Feature] When you navigate to a detail page from an alerting spotlight on the Healthwatch dashboard, it displays the minimum timescale that shows all relevant alerts.
  • [Feature] Healthwatch proactively raised the cap on the number of alerts a foundation can have. This prevents large, active foundations that create many alerts from reaching capacity.
  • [Bug Fix] The CLI Command Health graphs consistently show data for all customers. Previously, a memory consumption issue in the cf-health-check app led to instability and inconsistent data reporting.
  • [Bug Fix] Graph line colors match legend colors. Previously there was a bug where all graph lines were the same color.


Release Date: November 14, 2018

New Features in v1.4

  • Simplified category panels on dashboard, for quicker view of foundation health.

    • Smaller, uniform size panels.
    • Color shows current category status, derived from configurable alerts.
      • Red: Critical
      • Orange: Warning
      • Green: Healthy
      • Gray: Data unavailable
    • Non-status information moved to category detail pages that the user clicks into.
  • Expanded alerts functionality:

    • Added live Alert Stream along right side of dashboard.
    • Added searchable Alert History page, covering the last 24 hours.
    • Graphs of metrics show alert events, where metrics cross alert thresholds.

For further information about the new dashboard in PCF Healthwatch 1.4, including the Alert Stream, please check out Using PCF Healthwatch.

  • Added new metrics:

  • Syslog encryption: Healthwatch VM log output to syslog can be encrypted with TLS.

  • Resource use: Reduced number and scale of VM resources required to run PCF Heathwatch.

  • Internal database: Switched internal database from MariaDB to Percona (see Known Issues).

  • Stack: PCF Healthwatch uses cflinuxfs3 as the default stack when available.

    • When Healthwatch is installed in an environment that does not support cflinuxfs3, such as a PCF v2.2 environment, Healthwatch components run on the cflinuxfs2 stack.
  • Maintenance update of the following dependencies:

    • CF CLI now v6.40.1
    • Golang now v1.10.5
    • Flyway Command-line and Library now v5.2.1
    • OpenJDK now v1.8.0_192-b03
    • Spring now Brussels-SR14

Known Issues

  • While the PCF Healthwatch internal database migrates from MariaDB to Percona, upgrading to Healthwatch v1.4 from v1.3 may stop supermetrics collection for about 15 minutes. For more information, see Troubleshooting PCF Healthwatch.
  • When using PCF Healthwatch v1.4 with Pivotal Application Service (PAS) v2.4, you must enable Use “cf” as deployment name in emitted metrics instead of unique name in the Advanced Features pane of the PAS tile. This setting enables PCF Healthwatch to distinguish between the core PCF deployment and Isolation Segments. The corresponding manifest property is
    • This checkbox is enabled by default when upgrading from PAS v2.3. It is disabled by default on clean installs of PAS v2.4.
  • PAS v2.4 has an optional configuration Disable Zero Downtime App Deployments that changes the Active Locks KPI from 4 to 5. Healthwatch v1.4 has an alert on this KPI, locket.ActiveLocks, that alerts on the value of 4. If the zero-downtime feature is enabled, use the Healthwatch API to update the alert configuration to a value of 5. You can find Disable Zero Downtime App Deployments in the Advanced Features pane of the PAS tile. The corresponding manifest property is
    • You can send the following JSON payload to the PCF Healthwatch API to update active locks: "{\"query\":\"origin == 'locket' and name == 'ActiveLocks'\",\"threshold\":{\"critical\":5,\"type\":\"EQUALITY\"}}"
  • The healthwatch.Diego.AvailableFreeChunksDisk alert is visible in the PCF Healthwatch UI, but does not get sent to email, Slack, or webhook subscribers through PCF Event Alerts. This will be fixed in the next patch release of PCF Healthwatch.
  • If you installed an Isolation Segment with Isolated Routing and have not pushed apps to the Isolation Segment, the gorouter.ms_since_last_registry_update metric is emitted with a value of 9223372036.85s. This causes the Isolation Segment Routing stoplight to enter a critical (red) state. The routing stoplight recovers when an app is pushed to the Isolation Segment.
  • In rare circumstances, an Isolation Segment Capacity stoplight on the PCF Healthwatch Dashboard may be in a warning (yellow) state when it should critical (red) state.
  • If you apply changes to the BOSH Director tile while the PCF Healthwatch bosh-health-check is executing, BOSH may lose track of IP addresses that are in use at the IaaS-level.
    • This manifests as the error message Detected IP conflicts with other VMs on the same networks.
    • To resolve this issue, delete orphaned VMs at the IaaS level.
    • This will be fixed in upcoming Operations Manager patch releases (2.4.x and 2.3.x).
  • Some customers have reported intermittent crashes in the healthwatch-ingestor and cf-health-check apps.
    • This issue is being investigated. If an application instance crashes, the Diego architecture in Pivotal Cloud Foundry will replace the instance with minimal downtime.
  • Windows-based Diego cells created by Pivotal Application Service for Windows emit platform metrics with a hard-coded deployment value of cf. This can result in the following impacts to capacity values shown by PCF Healthwatch or other consumers of monitoring metrics:
    • If Isolation Segments are used in combination with Isolated Windows-based Diego cells: Any Windows-based cells that are isolated to a given isolation segment are reporting as part of the core cf system deployment. This means that the isolation segment(s) capacity values will be under-reporting (i.e., only including Linux-based cells and excluding Windows-based cells), and the core cf system deployment will be over-reporting capacity (including Windows-based cells from isolation segments as part of the core cf system capacity).
    • If Isolation Segments are not used: The core CF system deployment will correctly show total capacity, however both Windows-based cells and Linux-based cells will be grouped together in PCF Healthwatch capacity assessments such as capacity remaining and number of free chunks of memory.
  • The following threePAS MySQL KPI charts are hidden. These charts will be available in a future patch version of PCF Healthwatch:
    • Query Rate
    • MySQL CPU Busy Time
    • Percentage of Max Connections Used
  • Healthwatch has a fixed limit on alerts. Foundations that are busy or long-running may reach the limit on alerts. This results in the following error message:

    INSERT INTO alert_status Duplicate entry '8388607' for key 'PRIMARY'

    To resolve this issue, upgrade to Healthwatch v1.4.5 or later.

Breaking Changes for Automated Pipelines

  • The available values for the .properties.syslog_selector property have changed:
    • In 1.3.x, the values were inactive or active.
    • In 1.4.x, the values are No, Yes without encryption, or Yes with TLS encryption.
    • If automated pipelines are not updated when upgrading from PCF Healthwatch v1.3.x to v1.4.x to specify one of the new options, the pipelines fail.
Create a pull request or raise an issue on the source for this page in GitHub