Healthwatch v2.2 Release Notes

Page last updated:

This topic contains release notes for Healthwatch for VMware Tanzu v2.2.

For information about the risks and limitations of Healthwatch v2.2, see Assumed Risks of Using Healthwatch v2.2 and Healthwatch v2.2 Limitations in Healthwatch for VMware Tanzu.

Releases

v2.2.3

Release Date: 8/18/2022

  • [Feature Improvement] When you configure an alert receiver for Slack, you are not required to configure the Alert receiver configuration parameters field. The only fields you must configure are Alert receiver name and Slack API URL. For more information, see Configure a Slack Alert Receiver in Configuring Alerting.

  • [Feature Improvement] Email alerts that the Grafana instance sends include a link to the Grafana UI.

  • [Feature Improvement] In the Diego/Capacity dashboard in the Grafana UI, you can view metrics for Windows and Linux cells separately.

  • [Feature Improvement] In the System at a Glance dashboard in the Grafana UI, the VM Health panel no longer includes metrics for the bosh-health-check VM. For more information about the bosh-health-check VM, see BOSH Health Metric Exporter VM in Healthwatch Metrics.

  • [Feature Improvement] The BOSH Director Health dashboard in the Grafana UI includes the BOSH Health Check and BOSH Health Check Status History panels.

  • [Bug Fix] In te BOSH Director Health dashboard in the Grafana UI, the BOSH Director Status and BOSH Director Status History panels include the system_healthy metric.

  • [Bug Fix] In the Kubernetes Nodes dashboard in the Grafana UI, the panel legend is more readable.

  • [Known Issue Fix] After you re-deploy a highly-available (HA) Healthwatch installation, the Grafana instance can load metrics data in the Grafana UI while multiple Prometheus instances update. This known issue fix improves upon the known issue fix in Healthwatch v2.2.2. For more information about this known issue, see No Data While Re-Deploying Highly-Available Healthwatch Installations below.

Healthwatch v2.2.3 uses the following open-source component versions:

Component Packaged Version
Prometheus 2.36.2
Grafana 9.0.6
Alertmanager 0.24.0
PXC 0.43.0

v2.2.2

Release Date: 5/18/2022

  • [Feature] Healthwatch supports VMware Tanzu Application Service for VMs (TAS for VMs) v2.13 and earlier.

  • [Feature Improvement] The default alert templates are updated to better display grouped alerts.

  • [Feature Improvement] The Logging and Metrics Pipeline dashboard in the Grafana UI includes Syslog Agent metrics.

  • [Feature Improvement] The System At a Glance dashboard in the Grafana UI does not show metrics for compiler VMs.

  • [Bug Fix] The System at a Glance dashboard in the Grafana UI does not show duplicate Canary URL panels.

  • [Known Issue Fix] The RabbitMQ dashboards in the Grafana UI show data for RabbitMQ on-demand instances that are configured to communicate over TLS. For more information about this known issue, see No Data on RabbitMQ Dashboards for RabbitMQ On-Demand Instances Using TLS below.

  • [Known Issue Fix] After you re-deploy an HA Healthwatch installation, the Grafana instance can load metrics data in the Grafana UI while multiple Prometheus instances update. For more information about this known issue, see No Data While Re-Deploying Highly-Available Healthwatch Installations below.

  • [Known Issue Fix] The Kubernetes Nodes dashboard in the Grafana UI shows data for Kubernetes clusters that use the containerd runtime. For more information about this known issue, see No Data for containerd Clusters on Kubernetes Nodes Dashboard for TKGI v1.12 and Later below.

  • [Known Issue Fix] The smoke test for Prometheus VMs no longer runs before the Prometheus VM is ready. For more information about this known issue, see Prometheus Smoke Test Fails as Healthwatch Re-Deploys below.

  • [Known Issue Fix] The Prometheus instance no longer fails to clean up the chunks_head directory. For more information about this known issue, see Prometheus Clean-Up Failure Leads to Full Disk below.

Healthwatch v2.2.2 uses the following open-source component versions:

Component Packaged Version
Prometheus 2.35.0
Grafana 8.5.2
Alertmanager 0.24.0
PXC 0.42.0

v2.2.1

Release Date: 2/28/2022

Healthwatch v2.2.1 uses the following open-source component versions:

Component Packaged Version
Prometheus 2.33.1
Grafana 8.3.3
Alertmanager 0.23.0
PXC 0.40.0

v2.2.0

Release Date: 1/25/2022

Note: Healthwatch v2.2.0 does not support TKGI v1.13. If you have TKGI v1.13 installed on your Ops Manager foundation, upgrade to Healthwatch v2.2.1.

Healthwatch v2.2.0 uses the following open-source component versions:

Component Packaged Version
Prometheus 2.32.1
Grafana 8.3.3
Alertmanager 0.23.0
PXC 0.40.0

How to Upgrade

To upgrade from Healthwatch v2.1 to Healthwatch v2.2, see Upgrading Healthwatch.

New Features

Healthwatch v2.2 includes the following major features:

Default Routing Rules Are Pre-Configured for Alertmanager

In new installations of Healthwatch, the Routing rules field in the Alertmanager pane of the Healthwatch tile is pre-configured with a default set of routing rules. You can edit these routing rules according to the needs of your deployment.

For more information about configuring routing rules for Alertmanager, see Configure Alerting in Configuring Alerting.

Automatic Grafana UI Route Configuration

If your Ops Manager foundation has TAS for VMs installed, you can configure Healthwatch to automatically create a route for the Grafana UI in the Grafana pane of the Healthwatch tile.

For more information about configuring a route for the Grafana UI, see (Optional) Configure Grafana in Configuring Healthwatch.

Automatic UAA Authentication Configuration

When you select UAA as your Grafana UI authentication method in the Grafana Authentication pane of the Healthwatch tile, Healthwatch automatically configures authentication with the UAA instances in TAS for VMs and TKGI for the Grafana UI. If you want to configure authentication with a UAA instance on a different Ops Manager foundation, you must select Generic OAuth and configure it manually through the Grafana Authentication pane.

For more information about configuring UAA as your Grafana UI authentication method, see (Optional) Configure Grafana Authentication in Configuring Healthwatch.

System at a Glance Dashboard in the Grafana UI

The Grafana UI includes the System at a Glance dashboard. This dashboard displays an overview of metrics related to the health of your Ops Manager foundation and the runtimes you have installed on that foundation.

For more information about the System at a Glance dashboard, see Default Dashboards in the Grafana UI in Using Healthwatch Dashboards in the Grafana UI.

Grafana UI Logout URL

If you configured a generic OAuth provider to authenticate users who log in to the Grafana UI, you can configure a logout URL in the Grafana Authentication pane of the Healthwatch tile.

For more information about configuring a logout URL for the Grafana UI, see Configure Generic OAuth Authentication in Configuring Grafana Authentication.

Two Canary Test Metrics Emitted in the Loggregator Firehose

If you deploy the SVM Forwarder VM in the Healthwatch Exporter for TAS for VMs tile, the SVM Forwarder VM emits the probe_success and probe_duration_seconds canary test metrics into the Loggregator Firehose.

For more information about canary test metrics, see Prometheus VM in Healthwatch Metrics. For more information about the SVM Forwarder VM, see (Optional) Configure Resources in Configuring Healthwatch Exporter for TAS for VMs.

Prometheus Scrapes Metrics Directly from the BOSH Director VM

For Ops Manager v2.10.10 and later, the Prometheus instance scrapes BOSH Director metrics directly from the BOSH Director VM instead of the Loggregator Firehose. This allows the Prometheus VM to gather more types of metrics related to the health of the BOSH Director. These metrics appear in the Director Health dashboard in the Grafana UI.

For more information about the BOSH Director metrics that the Prometheus instance scrapes, see BOSH SLIs in Healthwatch Metrics.

Healthwatch Requires New Open Source License for Grafana v8

Healthwatch uses Grafana v8, which requires the Affero General Public License (AGPL).

For more information about the AGPL, see GNU Affero General Public License on the GNU site. For more information about Grafana v8, see the Grafana documentation.

Healthwatch Automatically Runs Canary Tests for the Ops Manager Installation Dashboard

Healthwatch automatically runs canary tests for the Ops Manager Installation Dashboard.

Automatic TKGI Cluster Discovery Configuration

Healthwatch automatically configures TKGI cluster discovery by default on Ops Manager foundations that have TKGI installed. If you do not want Healthwatch to configure TKGI cluster discovery, you can disallow it through the TKGI Cluster Discovery pane in the Healthwatch tile.

For more information about TKGI cluster discovery, see Configuring TKGI Cluster Discovery. For more information about allowing or disallowing TKGI cluster discovery, see Configure TKGI Cluster Discovery in Healthwatch in Configuring TKGI Cluster Discovery.

Remove Grafana

If you do not want to use any Grafana instances in your Healthwatch deployment, you can set the number of Grafana, MySQL, and MySQL Proxy instances for your Healthwatch deployment to 0 in the Resource Config pane of the Healthwatch tile.

For more information about removing Grafana from your Healthwatch deployment, see Removing Grafana in Healthwatch Components and Resource Requirements.

Grafana UI Dashboards Only Include Metrics for Current Canary Apps

Dashboards in the Grafana UI only show metrics for canary apps that are currently configured. Metrics for canary apps that are no longer used in your Healthwatch deployment are removed from your dashboards, in order to avoid mixing outdated data with current data.

For more information about canary test metrics, see Prometheus VM in Healthwatch Metrics.

TAS for VMs SLI Test Timeouts Are Increased

The timeouts for the TAS for VMs SLI test suite are increased to five minutes. This reduces the number of false positives you may see in your metrics data.

For more information about canary test metrics, see TAS for VMs SLI Exporter VM in Healthwatch Metrics.

Breaking Changes

Healthwatch v2.2 includes the following breaking changes:

Update Automation Scripts

Many configuration options have been added, changed, or removed for Healthwatch v2.2. If you use automated scripts to install and configure Healthwatch, you must update your scripts to reflect the new configuration requirements.

For more information about installing and configuring Healthwatch through platform automation, see Installing, Configuring, and Deploying a Tile Through an Automated Pipeline.

Authenticating with a UAA Instance on a Different Ops Manager Foundation

If you are upgrading from Healthwatch v2.1 and configured UAA as your authentication method for logging in to the Grafana UI, Healthwatch v2.2 keeps UAA as your configured authentication method by default. If you configured a UAA instance on a different Ops Manager foundation as the authentication method for logging in to the Grafana UI in Healthwatch v2.1, you must select Generic OAuth and configure the settings for the external UAA instance in the Grafana Authentication pane.

For more information about configuring a UAA instance on a different Ops Manager foundation as the authentication method for logging in to the Grafana UI, see Configuring Authentication with a UAA Instance on a Different Ops Manager Foundation.

Timer Metric Exporter VM is Removed

The timer metric exporter VM, pas-exporter-timer, is removed from Healthwatch Exporter for TAS for VMs. This removes unnecessary data and uses fewer IaaS resources.

For more information about the metrics for TAS for VMs that Healthwatch Exporter for TAS for VMs collects, see Healthwatch Exporter for TAS for VMs Metric Exporter VMs in Healthwatch Metrics.

Healthwatch v2.2.1 Requires Additional Configuration in TKGI v1.13

After you install Healthwatch v2.2.1, you must configure TKGI v1.13 to send metrics for Kubernetes Controller Manager to Healthwatch.

For more information about configuring TKGI v1.13 to send metrics for Kubernetes Controller Manager to Healthwatch, see Configure TKGI in Configuring TKGI Cluster Discovery.

Known Issues

Healthwatch v2.2 includes the following known issues:

SVM Forwarder Creates Recursive Metric Labels

This known issue is fixed in Healtwatch v2.2.1 and later.

When the SVM Forwarder VM is deployed in Healthwatch Exporter for TAS for VMs, a change in the Prometheus server causes metrics with the job and exported_job labels to become recursive. For example, exported_job becomes exported_exported_exported_exported_job.

To work around this issue, set the number of SVM Forwarder VM instances for your Healthwatch deployment to 0 in the Resource Config pane of the Healthwatch Exporter for TAS for VMs tile. For more information about scaling Healthwatch resources, see (Optional) Configure Resources in Configuring Healthwatch Exporter for TAS for VMs.

No Data for containerd Clusters on Kubernetes Nodes Dashboard for TKGI v1.12 and Later

This known issue is fixed in Healthwatch v2.2.2 and later.

If you have TKGI v1.12 or later installed, the Kubernetes Nodes dashboard in the Grafana UI might not show data for Kubernetes clusters that use the containerd runtime.

In TKGI v1.11 and earlier, the name label in Kubernetes cluster metrics start with k8s_. However, in TKGI v1.12 and later, new Kubernetes clusters run on containerd instead of in Docker. As a result, in TKGI v1.12 and above the name label in Kubernetes cluster metrics start with a hex value instead of k8s_, which the Grafana instance does not recognize.

To fix this issue, upgrade to Healthwatch v2.2.2 or later.

No Data for Individual Pods on Kubernetes Nodes Dashboard for TKGI v1.10

If you are using TKGI v1.10.0 or v1.10.1, the Kubernetes Nodes dashboard in the Grafana UI might not show data for individual pods. This is due to a known issue in Kubernetes v1.19.6 and earlier and Kubernetes v1.20.1 and earlier.

To fix this issue, upgrade to TKGI v1.10.2 or later. For more information about upgrading to TKGI v1.10.2 or later, see the TKGI documentation.

No Data on Kubernetes Nodes Dashboard for Windows Clusters

If you are using TKGI to monitor Windows clusters, the Kubernetes Nodes dashboard in the Grafana UI might not show data. Healthwatch does not currently visualize node metrics for Windows clusters.

Healthwatch Exporter for TKGI Does Not Clean Up TKGI Service Accounts

This known issue is fixed in Healthwatch v2.2.1 and later.

If you run SLI tests for TKGI through Healthwatch Exporter for TKGI, and you do not have an OpenID Connect (OIDC) provider for your Kubernetes clusters configured for TKGI, the TKGI SLI exporter VM does not automatically clean up the service accounts that it creates while running the TKGI SLI test suite.

To fix this issue, either upgrade to Healthwatch v2.2.1 or configure an OIDC provider as the identity provider for your Kubernetes clusters in the TKGI tile. This cleans up the service accounts that the TKGI SLI exporter VM creates in future TKGI SLI tests, but does not clean up existing service accounts from previous TKGI SLI tests. For more information about configuring an OIDC provider in TKGI, see the TKGI documentation.

You may need to manually delete existing service accounts from previous TKGI SLI tests. For more information about manually deleting existing service accounts, see Healthwatch Exporter for TKGI Does Not Clean Up TKGI Service Accounts in Troubleshooting Heathwatch.

BBR Backup Snapshots Fill Disk Space on Prometheus VMs

This known issue is fixed in Healthwatch v2.2.1 and later.

In Healthwatch v2.2.0, the backup scripts for Prometheus VMs do not clean up the intermediary snapshots created by BBR. This results in the disk space on Prometheus VMs filling up.

To fix this issue, either upgrade to Healthwatch v2.2.1 or manually clean up the snapshots. For more information about manually cleaning up the snapshots, see BBR Backup Snapshots Fill Disk Space on Prometheus VMs in Troubleshooting Healthwatch.

No Data on RabbitMQ Dashboards for RabbitMQ On-Demand Instances Using TLS

This known issue is fixed in Healthwatch v2.2.2 and later.

In Healthwatch v2.2.1 and earlier, the Prometheus instance does not scrape metrics from RabbitMQ on-demand instances that are configured to communicate over TLS. As a result, the RabbitMQ dashboards in the Grafana UI show no data for RabbitMQ on-demand instances that are configured to use TLS.

To fix this issue, upgrade to Healthwatch v2.2.2 or later and RabbitMQ v2.0.13 or later.

No Data While Re-Deploying Highly-Available Healthwatch Installations

This known issue is fixed in Healthwatch v2.2.3 and later.

In Healthwatch v2.2.2 and earlier, the Grafana instance cannot load metrics data in the Grafana UI after you re-deploy an HA Healthwatch installation with multiple Prometheus instances. An HA Healthwatch installation is meant to allow the Grafana instance to continue loading data during re-deployment by ensuring that the second Prometheus instance does not start updating until after the first Prometheus instance has updated and re-starts. In Healthwatch v2.2.2 and earlier, a bug causes the second Prometheus instance to start updating before the first Prometheus instance re-starts.

To fix this issue, upgrade to Healthwatch v2.2.3 or later.

Prometheus Smoke Test Fails as Healthwatch Re-Deploys

This known issue is fixed in Healthwatch v2.2.2 and later.

In Healthwatch v2.2.1 and earlier, a potential race condition sometimes causes the smoke test for Prometheus VMs to run before the Prometheus VM is ready. This leads to the smoke test failing when you re-deploy Healthwatch, even though it succeeds when you run the smoke test manually.

Prometheus Clean-Up Failure Leads to Full Disk

This known issue is fixed in Healthwatch v2.2.2 and later.

In Healthwatch v2.2.1, under rare circumstances, the Prometheus instance fails to clean up the chunks_head directory. This leads to a full disk and subsequent failures when the Prometheus instance attempts to process new metrics.

To fix this issue, upgrade to Healthwatch v2.2.2 or later.