Healthwatch v2.1 Release Notes

Page last updated:

This topic contains release notes for Healthwatch for VMware Tanzu v2.0.6 and v2.1.

Note: Healthwatch v2.0.6 is a beta version that is no longer available for download. VMware does not recommend using Healthwatch v2.0.6 in production environments.

The architecture of Healthwatch v2.1 is entirely different from the architecture of Pivotal Healthwatch v1. Healthwatch v2.1 uses the open-source components Prometheus, Grafana, and Alertmanager to scrape, store, and view metrics, as well as configure alerts. For more information about the differences between Pivotal Healthwatch v1 and Healthwatch v2.1 and how to upgrade to Healthwatch v2.1, see Upgrading Healthwatch.

For more information about Healthwatch v2.1, see Healthwatch for VMware Tanzu 2.1 Offers Breakthrough Platform Monitoring on the VMware Tanzu blog and New Features below.

For information about the risks and limitations of Healthwatch v2.1, see Assumed Risks of Using Healthwatch v2.1 and Healthwatch v2.1 Limitations in Healthwatch.

Releases

v2.1.9

Release Date: 2/18/2022

Healthwatch v2.1.9 uses the following open-source component versions:

Component Packaged Version
Prometheus 2.33.1
Grafana 7.5.11
Alertmanager 0.23.0

v2.1.8

Release Date: 12/16/2021

Healthwatch v2.1.8 uses the following open-source component versions:

Component Packaged Version
Prometheus 2.25.0
Grafana 7.5.11
Alertmanager 0.21.0

v2.1.7

Release Date: 12/10/2021

Healthwatch v2.1.7 uses the following open-source component versions:

Component Packaged Version
Prometheus 2.25.0
Grafana 7.5.11
Alertmanager 0.21.0

v2.1.6

Release Date: 11/11/2021

  • [Bug Fix] The user account that BBR uses to back up the MySQL instance in Healthwatch has the correct permissions.

  • [Bug Fix] The Cloud Foundry Command-Line Interface (cf CLI) version that Healthwatch uses is compatible with VMware Tanzu Application Service for VMs (TAS for VMs) v2.7.

  • [Known Issue] The TKGI SLI exporter VM does not clean up the service accounts it creates while running the TKGI SLI test suite. For more information about this known issue, see Healthwatch Exporter for TKGI Does Not Clean Up TKGI Service Accounts below.

  • [Known Issue] The backup scripts for Prometheus VMs do not clean up the intermediary snapshots created by BBR. For more information about this known issue, see BBR Backup Snapshots Fill Disk Space on Prometheus VMs below.

  • [Known Issue] If you have TKGI v1.12 or later installed, the Kubernetes Nodes dashboard in the Grafana UI shows no data for Kubernetes clusters that use the containerd runtime. For more information about this known issue, see No Data for containerd Clusters on Kubernetes Nodes Dashboard for TKGI v1.12 and Later below.

Healthwatch v2.1.6 uses the following open-source component versions:

Component Packaged Version
Prometheus 2.25.0
Grafana 7.5.11
Alertmanager 0.21.0

v2.1.5

Release Date: 10/19/2021

Healthwatch v2.1.5 uses the following open-source component versions:

Component Packaged Version
Prometheus 2.25.0
Grafana 7.5.11
Alertmanager 0.21.0

v2.1.4

Release Date: 09/21/2021

  • [Feature] Healthwatch supports TKGI v1.12 and earlier.

  • [Feature] You can filter the Diego/Capacity dashboard in the Grafana UI by isolation segment.

    Note: Isolation segments that do not have a placement_tag label are included when you select cf from the Placement Tag dropdown.

  • [Feature Improvement] In the Diego/Capacity dashboard in the Grafana UI, the Memory: # of 6GB Chunks Used graph is renamed Memory: # of 4GiB Chunks Remaining.

  • [Feature Improvement] In the Diego/Capacity dashboard in the Grafana UI, the Disk: # of 6GB Chunks Used graph is renamed Disk: # of 4GiB Chunks Remaining.

  • [Feature Improvement] In the Diego/Capacity dashboard in the Grafana UI, the Containers graph is renamed Containers Remaining.

  • [Feature Improvement] In the Diego/Capacity dashboard in the Grafana UI, memory and disk capacity metrics are calculated in GiB.

  • [Known Issue Fix] The SVM Forwarder VM correctly forwards SVMs to the Loggregator Firehose after the first time it deploys. For more information about this known issue, see SVM Forwarder VM Does Not Initially Forward SVMs below.

  • [Known Issue Fix] When you configure alert receivers in the Alertmanager configuration pane, the Basic authentication credentials fields include a field to configure a username. For more information about this known issue, see Missing Username Field in Alert Receiver Basic Authentication Configuration below.

  • [Bug Fix] In email alerts from Alertmanager, links to the Grafana UI correctly generate with HTTP or HTTPS.

  • [Known Issue] If you are using Ops Manager v2.10.19 or later, the Healthwatch tile fails to install. For more information about this known issue, see Healthwatch Fails to Install on Ops Manager v2.10.19 and Later below.

  • [Known Issue] If you have TKGI v1.12 or later installed, the Kubernetes Nodes dashboard in the Grafana UI shows no data for Kubernetes clusters that use the containerd runtime. For more information about this known issue, see No Data for containerd Clusters on Kubernetes Nodes Dashboard for TKGI v1.12 and Later below.

  • [Known Issue] If you are using TKGI to monitor Windows clusters, the Kubernetes Nodes dashboard in the Grafana UI shows no data. For more information about this known issue, see No Data on Kubernetes Nodes Dashboard for Windows Clusters below.

  • [Known Issue] The TKGI SLI exporter VM does not clean up the service accounts it creates while running the TKGI SLI test suite. For more information about this known issue, see Healthwatch Exporter for TKGI Does Not Clean Up TKGI Service Accounts below.

  • [Known Issue] The backup scripts for Prometheus VMs do not clean up the intermediary snapshots created by BBR. For more information about this known issue, see BBR Backup Snapshots Fill Disk Space on Prometheus VMs below.

Healthwatch v2.1.4 uses the following open-source component versions:

Component Packaged Version
Prometheus 2.25.0
Grafana 7.5.4
Alertmanager 0.21.0

v2.1.3

Release Date: 07/15/2021

  • [Feature] Healthwatch supports TKGI v1.11 and earlier.

  • [Feature] Healthwatch automatically creates a scrape job for the Prometheus endpoints on the BOSH Director VM.

  • [Feature] You can filter the TAS Router dashboard in the Grafana UI by isolation segment.

  • [Breaking Change] The ssl_certificate_expiry_seconds metric has additional tags to ensure that metrics are unique. For more information, see Certificate Expiration Metrics Have Additional Tags below.

  • [Bug Fix] The Grafana instance creates dashboards for the correct runtime version when dynamically discovering runtimes on your Ops Manager foundation.

  • [Bug Fix] Healthwatch deletes the temporary files it creates during TKGI cluster discovery after they are no longer needed.

  • [Bug Fix] The no_follow_redirects: true flag is removed from the Blackbox Exporter. This fixes Ops Manager dashboards in the Grafana UI.

    Note: The bug that the no_follow_redirects: true flag addresses is fixed in Ops Manager v2.10.9. If you are running Ops Manager v2.10.8 or earlier, you may still see the /home/tempest-web/uaa/tomcat log file grow very large over time.

  • [Known Issue] If you are using Ops Manager v2.10.19 or later, the Healthwatch tile fails to install. For more information about this known issue, see Healthwatch Fails to Install on Ops Manager v2.10.19 and Later below.

  • [Known Issue] The SVM Forwarder VM does not forward SVMs to the Loggregator Firehose after the first time it deploys. For more information, see SVM Forwarder VM Does Not Initially Forward SVMs below.

  • [Known Issue] In the Alertmanager pane, you cannot configure basic authentication for alert receivers because the field to configure a username is missing. For more information about this known issue, see Missing Username Field in Alert Receiver Basic Authentication Configuration below.

  • [Known Issue] If you are using TKGI to monitor Windows clusters, the Kubernetes Nodes dashboard in the Grafana UI shows no data. For more information about this known issue, see No Data on Kubernetes Nodes Dashboard for Windows Clusters below.

  • [Known Issue] The TKGI SLI exporter VM does not clean up the service accounts it creates while running the TKGI SLI test suite. For more information about this known issue, see Healthwatch Exporter for TKGI Does Not Clean Up TKGI Service Accounts below.

  • [Known Issue] The backup scripts for Prometheus VMs do not clean up the intermediary snapshots created by BBR. For more information about this known issue, see BBR Backup Snapshots Fill Disk Space on Prometheus VMs below.

Healthwatch v2.1.3 uses the following open-source component versions:

Component Packaged Version
Prometheus 2.25.0
Grafana 7.5.4
Alertmanager 0.21.0

v2.1.1

Release Date: 05/14/2021

  • [Feature] Healthwatch supports TAS for VMs v2.11 and earlier.

  • [Known Issue Fix] When uninstalling the Healthwatch Exporter for TAS for VMs tile, the bosh-health deployment is always deleted. For more information about this known issue, see BOSH Health Metric Exporter VM Causes 401 Error below.

  • [Known Issue Fix] Healthwatch components deploy across availability zones (AZs) when you configure them to do so. For more information about this known issue, see Scaled VMs Not Distributed Across AZs by Default below.

  • [Bug Fix] The uid|title is used more than once error does not appear in Grafana VM logs.

  • [Bug Fix] The Failed to read plugin provisioning files from directory error does not appear in Grafana VM logs.

  • [Bug Fix] The Uptime SLO Target filter dropdown appears in the Healthwatch SLO dashboard in the Grafana UI.

  • [Bug Fix] The Exporter Availability graphs in the Healthwatch SLO dashboard in the Grafana UI show six digits of precision.

  • [Bug Fix] You can scale the Prometheus instance down to one VM.

  • [Bug Fix] The BBR script does not return a Snapshot failed: Client sent an HTTP request to an HTTPS server error.

  • [Bug Fix] Healthwatch Exporter for TKGI requires the TKGI tile to be installed.

  • [Bug Fix] Healthwatch Exporter for TAS for VMs requires the TAS for VMs tile to be installed.

  • [Bug Fix] The Ops Manager UAA instance logs canary URL test redirects to the Ops Manager Installation Dashboard correctly.

  • [Known Issue] If you are using Ops Manager v2.10.19 or later, the Healthwatch tile fails to install. For more information about this known issue, see Healthwatch Fails to Install on Ops Manager v2.10.19 and Later below.

  • [Known Issue] In the Alertmanager pane, you cannot configure basic authentication for alert receivers because the field to configure a username is missing. For more information about this known issue, see Missing Username Field in Alert Receiver Basic Authentication Configuration below.

  • [Known Issue] If you are using TKGI to monitor Windows clusters, the Kubernetes Nodes dashboard in the Grafana UI shows no data. For more information about this known issue, see No Data on Kubernetes Nodes Dashboard for Windows Clusters below.

  • [Known Issue] The TKGI SLI exporter VM does not clean up the service accounts it creates while running the TKGI SLI test suite. For more information about this known issue, see Healthwatch Exporter for TKGI Does Not Clean Up TKGI Service Accounts below.

  • [Known Issue] The backup scripts for Prometheus VMs do not clean up the intermediary snapshots created by BBR. For more information about this known issue, see BBR Backup Snapshots Fill Disk Space on Prometheus VMs below.

Healthwatch v2.1.1 uses the following open-source component versions:

Component Packaged Version
Prometheus 2.25.0
Grafana 7.5.4
Alertmanager 0.21.0

v2.1.0

Release Date: 03/18/2021

Healthwatch v2.1.0 uses the following open-source component versions:

Component Packaged Version
Prometheus 2.25.0
Grafana 7.4.2
Alertmanager 0.21.0

v2.0.6

Release Date: 02/11/2021

Healthwatch v2.0.6 uses the following open-source component versions:

Component Packaged Version
Prometheus 2.25.0
Grafana 7.4.2
Alertmanager 0.21.0

How to Upgrade

To upgrade from Pivotal Healthwatch v1 to Healthwatch v2.1, see Upgrading Healthwatch.

New Features

Healthwatch v2.0.6 and v2.1 include the following major features:

Healthwatch Supports TAS for VMs v2.12 and Earlier

You can use Healthwatch to monitor TAS for VMs v2.12 and earlier.

Your TAS for VMs dashboards in the Grafana UI update automatically to display TAS for VMs v2.12 metrics unless you manually set the dashboard version when you configure the Grafana VM. For more information about setting the TAS for VMs version for your dashboards, see Configure Grafana in Configuring Healthwatch.

Healthwatch Supports TKGI v1.12 and Earlier

You can use Healthwatch to monitor TKGI v1.12 and earlier.

Your TKGI dashboard in the Grafana UI updates automatically to display TKGI v1.12 metrics unless you manually set the dashboard version when you configure the Grafana VM. For more information about setting the TGKI version for your dashboards, see Configure Grafana in Configuring Healthwatch.

Assign Static IP Addresses to Prometheus VMs

You can assign static IP addresses to your Prometheus VMs.

If you configure email alerts through Alertmanager, you may need to add the IP addresses of your Prometheus VMs to your Ops Manager allowlist so your SMTP server does not block them. You can then view the IP addresses of your Prometheus VMs using the BOSH CLI.

For more information about assigning IP addresses to your Prometheus VMs, see (Optional) Configure Prometheus in Configuring Healthwatch.

Healthwatch Replaces Event Alerts With Alertmanager

Healthwatch v2.1 uses Alertmanager, an open-source Prometheus component, to manage and send alerts according to the alerting rules you configure. Pivotal Healthwatch v1 used Pivotal Event Alerts for managing alerts, which Healthwatch v2.1 does not support.

For more information about configuring alerts, see Configuring Alerting.

Healthwatch Removes Monitoring Indicator Protocol

Healthwatch no longer supports Monitoring Indicator Protocol, and the Indicator Protocol dashboard is removed from the Grafana UI.

This change does not affect VMware Tanzu RabbitMQ for VMs (Tanzu RabbitMQ) metrics and dashboards.

Breaking Changes

Healthwatch v2.0.6 and v2.1 includes the following breaking changes:

Certificate Expiration Metrics Have Additional Tags

The ssl_certificate_expiry_seconds metric has additional tags to ensure that each metric is unique. The addition of these tags does not change any metric names.

Self-Signed Certificates Break Dashboards

While Pivotal Healthwatch v1 does not require TLS verification for Ops Manager certificates, Healthwatch v2.1 checks for TLS certificate verification by default. If your Ops Manager deployment uses self-signed certificates, you must configure the Healthwatch tile to skip TLS certificate verification.

If the Ops Manager Health dashboard in the Grafana UI displays a “Not Running” error, activate the Skip TLS certificate verification checkbox in the Canary URLs pane of the Healthwatch tile. For more information about configuring this checkbox, see (Optional) Configure Canary URLs in Configuring Healthwatch.

If your Certificate Expiration dashboard displays “N/A” or you see errors in your certificate expiration metric logs, activate the Skip TLS certificate verification in the TAS for VMs Metric Exporter VMs pane of the Healthwatch Exporter for TAS for VMs tile or the TKGI Metric Exporter VMs pane of the Healthwatch Exporter for TKGI tile. For more information about configuring this checkbox, see (Optional) Configure TAS for VMs Metric Exporter VMs in Configuring Healthwatch Exporter for TAS for VMs or (Optional) Configure TKGI and Certificate Expiration Metric Exporter VMs in Configuring Healthwatch Exporter for TKGI.

Known Issues

Healthwatch v2.0.6 and v2.1 include the following known issues:

Healthwatch Fails to Install on Ops Manager v2.10.19 and Later

This known issue is fixed in Healthwatch v2.1.5.

If you are using Ops Manager v2.10.19 or later, Healthwatch fails to install. Attempting to install Healthwatch causes the following errors:

 Preparing deployment: Rendering templates
L Error: Unable to render instance groups for deployment. Errors are:
- Unable to render jobs for instance group 'tsdb'. Errors are:
- Unable to render templates for job 'prometheus'. Errors are:
- Error filling in template 'alerting.rules.yml.erb' (line 33: undefined method \`escape' for URI:Module)
Error: Unable to render instance groups for deployment. Errors are:
- Unable to render jobs for instance group 'tsdb'. Errors are:
- Unable to render templates for job 'prometheus'. Errors are:
- Error filling in template 'alerting.rules.yml.erb' (line 33: undefined method \`escape' for URI:Module)

To fix this issue, upgrade to Healthwatch v2.1.5.

SVM Forwarder VM Does Not Initially Forward SVMs

This known issue is fixed in Healthwatch v2.1.4 and later.

The SVM Forwarder VM does not forward SVMs to the Loggregator Firehose after the first time it deploys.

To fix this issue, restart the SVM Forwarder VM by running:

monit restart svm-formwarder

Scaled VMs Not Distributed Across AZs by Default

This known issue is fixed only for new installations of Healthwatch v2.1.1 and later. If you upgrade to Healthwatch v2.1.1 from Healthwatch v2.1.0 or earlier, this known issue still exists.

If you scale Healthwatch component VMs up, they are not automatically distributed across AZs. To distribute them evenly across AZs, you must scale the VMs down, then scale them back up.

For more information about scaling Healthwatch component VMs, see Healthwatch Components and Resource Requirements.

“Unable to Render Templates” Error When Installing or Upgrading

This known issue is fixed in Ops Manager v2.8 and later.

When installing or upgrading to Healthwatch v2.1, you could see the following error:

- Unable to render templates for job 'opsman-cert-expiration-exporter'. Errors are:
  - Error filling in template 'bpm.yml.erb' (line 9: Can't find property '["opsman_access_credentials.uaa_client_secret"]')

This error occurs if you upgraded from Ops Manager v2.3 or earlier to Ops Manager v2.4 through v2.7.

For more information about how to fix this issue without upgrading to Ops Manager v2.8, see “Unable to Render Templates” Error When Installing or Upgrading in Troubleshooting Healthwatch.

Missing Username Field in Alert Receiver Basic Authentication Configuration

This known issue is fixed in Healthwatch v2.1.4 and later.

In Healthwatch v2.1.3 and earlier, you cannot configure basic authentication credentials for PagerDuty, Slack, or webhook alert receivers because the field to configure a username is missing.

To fix this issue, upgrade to Healthwatch v2.1.4 or later.

BOSH Health Metric Exporter VM Causes 401 Error

This known issue is fixed in Healthwatch v2.1.1 and later.

If you re-install the Healthwatch Exporter for TAS for VMs tile, the BOSH health metric exporter VM does not always delete the BOSH deployment it creates, bosh-health. This causes the following error:

Director responded with non-successful status code '401' response
'{"code":600000,"description":"Require one of the scopes: bosh.admin,
bosh.750587e9-eae5-494f-99c4-5ca429b13959.admin,
bosh.teams.p-healthwatch2-pas-exporter-b3a337d7ec4cca94f166.admin"}'

To fix this issue, upgrade to Healthwatch v2.1.1 or later or manually delete the bosh-health deployment using the BOSH CLI. For more information about upgrading to Healthwatch v2.1.1, see Upgrading Healthwatch. For more information about deleting a BOSH deployment using the BOSH CLI, see the BOSH documentation.

No Data for containerd Clusters on Kubernetes Nodes Dashboard for TKGI v1.12 and Later

This known issue is fixed in Healthwatch v2.2.2 and later.

If you have TKGI v1.12 or later installed, the Kubernetes Nodes dashboard in the Grafana UI might not show data for Kubernetes clusters that use the containerd runtime.

In TKGI v1.11 and earlier, the name label in Kubernetes cluster metrics start with k8s_. However, in TKGI v1.12 and later, new Kubernetes clusters run on containerd instead of in Docker. As a result, in TKGI v1.12 and above the name label in Kubernetes cluster metrics start with a hex value instead of k8s_, which the Grafana instance does not recognize.

To fix this issue, upgrade to Healthwatch v2.2.2 or later. For more information about upgrading to Healthwatch v2.2, see Upgrading Healthwatch.

No Data on Kubernetes Nodes Dashboard for TKGI v1.10

If you are using TKGI v1.10.0 or v1.10.1, the Kubernetes Nodes dashboard in the Grafana UI might not show data for individual pods. This is due to a known issue in Kubernetes v1.19.6 and earlier and Kubernetes v1.20.1 and earlier.

To fix this issue, upgrade to TKGI v1.10.2 or later. For more information about upgrading to TKGI v1.10.2 or later, see the TKGI documentation.

No Data on Kubernetes Nodes Dashboard for Windows Clusters

If you are using TKGI to monitor Windows clusters, the Kubernetes Nodes dashboard in the Grafana UI might not show data. Healthwatch does not currently visualize node metrics for Windows clusters.

MySQL Proxy Disk Space Fills Up Quickly

This known issue is fixed in Healthwatch v2.1 and later.

In Healthwatch v2.0.6, the MySQL PXC instance stores too many binlogs, which fills up the persistent disk space for MySQL Proxy VMs at a faster rate.

To fix this issue, upgrade to Healthwatch v2.1. For more information about upgrading to Healthwatch v2.1, see Upgrading Healthwatch.

Healthwatch Exporter for TKGI Does Not Clean Up TKGI Service Accounts

This known issue is fixed in Healthwatch v2.1.9 and later.

If you run SLI tests for TKGI through Healthwatch Exporter for TKGI, and you do not have an OpenID Connect (OIDC) provider for your Kubernetes clusters configured for TKGI, the TKGI SLI exporter VM does not automatically clean up the service accounts that it creates while running the TKGI SLI test suite.

To fix this issue, either upgrade to Healthwatch v2.1.9 or configure an OIDC provider as the identity provider for your Kubernetes clusters in the TKGI tile. This cleans up the service accounts that the TKGI SLI exporter VM creates in future TKGI SLI tests, but does not clean up existing service accounts from previous TKGI SLI tests. For more information about configuring an OIDC provider in TKGI, see the TKGI documentation.

You may need to manually delete existing service accounts from previous TKGI SLI tests. For more information, see Healthwatch Exporter for TKGI Does Not Clean Up TKGI Service Accounts in Troubleshooting Heathwatch.

BBR Backup Snapshots Fill Disk Space on Prometheus VMs

This known issue is fixed in Healthwatch v2.1.9 and later.

In Healthwatch v2.1.8 and earlier, the backup scripts for Prometheus VMs do not clean up the intermediary snapshots created by BBR. This results in the disk space on Prometheus VMs filling up.

To fix this issue, either upgrade to Healthwatch v2.1.9 or manually clean up the snapshots. To manually clean up the snapshots, see BBR Backup Snapshots Fill Disk Space on Prometheus VMs in Troubleshooting Healthwatch.