PCF Healthwatch v1.6 Release Notes
Warning: PCF Healthwatch v1.6 is no longer supported or available for download. PCF Healthwatch v1.6 has reached the End of General Support (EOGS) phase as defined by the Support Lifecycle Policy. To stay up to date with the latest software and security updates, upgrade to a supported version.
v1.6.4
Release Date: February 28, 2020
Features
New features and changes in this release:
- Remove alert associated with the removed metric
Reverse Log Proxy Loss Rate
. See Reverse Log Proxy Loss Rate. - [Bug Fix] Fix infinite redirect at login.
- [Bug Fix] Remove the
healthwatch_space_developer
user when the PCF Healthwatch tile is uninstalled. - [Bug Fix] Fix the issue where Super Value Metrics were lost when large volumes of router latency metrics overloaded the ingestor instances.
[Bug Fix] Fix the installation failure due to duplicated entry in
free_chunks_configuration
which would cause flyway migration failure during Apply Change.Maintenance update of the following dependencies:
- Spring Boot now 2.1.9
- Indicator Protocol now 0.7.17
- Syslog release now 11.6.0
Known Issues
This release has the following known issues.
Disk Slowly Fills When Using vSAN with Healthwatch Leads
The vSAN object count increases on vSphere versions earlier than v6.5 update 2.
Healthwatch deploys the app bosh-health-check
, which deploys and deletes a VM every 10 minutes. vSphere versions earlier than v6.5 update 2 leave a namespace or folder and subfolders when a VM is deleted. The orphaned folders cause the vSAN object count to increase. This is a known issue for vSAN. For more information about the vSAN known issue, see Deleted VMs leave components behind in GitHub.
To address the issue, update vSphere to v6.5 update 2 or later. Or, stop the bosh-health-check
to slow down the increase in vSAN object count.
Indicator Protocol Beta Dashboard Displays Error Due to Log Cache
Occasionally, the Indicator Protocol Beta Dashboard charts fail to load with the following error: "Error fetching graph data."
.
The Indicator Protocol Beta Dashboard charts are populated using data from Log Cache, which is a component of Loggregator. The charts may fail to load if Log Cache times out while processing the data.
No corrective action is required. The issue will self-resolve if possible.
Infinite Login Redirect When Using Private Domain Suffixes
Certain private domain suffixes, such as .local
or .a
, result an infinite redirect loop when trying to access the Healthwatch UI.
A workaround is to set the SKIP_CERT_VERIFY
environment variable to true
on the Healthwatch app.
For the canonical list of public suffixes, see the Public Suffix List.
v1.6.3
Release Date: September 17, 2019
Features
New features and changes in this release:
- Remove metric, graph, and alert associated to
Route Registration Messages Delta
. This metric was removed in PAS 2.4 so related graphs and alerts should not display. The current associated alert will be resolved automatically. - [Bug Fix] Correctly handle rotation of root Certificate Authorities.
- [Bug Fix] Correct the threshold for Syslog Adapter Capacity.
- [Bug Fix] Reduce noisiness of
system.healthy
alerts when a BOSH VM is created or deleted. - [Bug Fix] If
healthwatch-ingestor
fails to receive data after 15 seconds, it will automatically reset its Spring Application Context to re-establish a Firehose connection. After 20 resets of the Spring Application Context, the app instance will purposely crash and let Diego re-schedule it, providing a fresh container and JVM instance. - [Bug Fix] Fix
healthwatch-ingestor
crash in cases where GoRouter receives an HTTP request with non-standard HTTP method, resulting in a HttpStartStop metric with a null HTTP method value. - [Bug Fix] Setting
Redis Worker Count
in the Healthwatch Component Config page of Ops Manager successfully changes instance number. Previously, changes to this field were not reflected in the Healthwatch deployment. - [Bug Fix] Delete orphaned
cf-health-check
smoke-test-app
instances regularly. Previously,cf-health-check
would occasionally fail to delete a smoke test and never cleaned it up. [Bug Fix] Fix occasional inaccurate spikes in Log Transport Throughput graph.
Maintenance update of the following dependencies:
- Golang now 1.12.9
- Java now 1.8.0_222-b10
- Indicator Protocol now 0.7.16
- Spring Boot now 2.1.7
Known Issues
This release has the following known issues.
Disk Slowly Fills When Using vSAN with Healthwatch Leads
The vSAN object count increases on vSphere versions earlier than v6.5 update 2.
Healthwatch deploys the app bosh-health-check
, which deploys and deletes a VM every 10 minutes. vSphere versions earlier than v6.5 update 2 leave a namespace or folder and subfolders when a VM is deleted. The orphaned folders cause the vSAN object count to increase. This is a known issue for vSAN. For more information about the vSAN known issue, see Deleted VMs leave components behind in GitHub.
To address the issue, update vSphere to v6.5 update 2 or later. Or, stop the bosh-health-check
to slow down the increase in vSAN object count.
Indicator Protocol Beta Dashboard Displays Error Due to Log Cache
Occasionally, the Indicator Protocol Beta Dashboard charts will fail to load with the following error: "Error fetching graph data."
.
The Indicator Protocol Beta Dashboard charts are populated using data from Log Cache, which is a component of Loggregator. The charts may fail to load if Log Cache times out while processing the data.
No corrective action is required. The issue will self-resolve if possible.
Multiple healthwatch_space_developer
Users Created During Healthwatch Re-installation
When the PCF Healthwatch tile is re-installed, the push-apps
errand creates a duplicate healthwatch_space_developer
user because the pre-existing user is not deleted during
the previous tile’s deletion.
This causes the cf-health-check
to fail due to an invalid password for the healthwatch_space_developer
user.
Infinite Login Redirect When Using Private Domain Suffixes
Certain private domain suffixes, such as .local
or .a
, result an infinite redirect loop when trying to access the Healthwatch UI.
A workaround is to set the SKIP_CERT_VERIFY
environment variable to true
on the Healthwatch app.
For the canonical list of public suffixes, see the Public Suffix List.
v1.6.2 – Withdrawn
This release has been removed from Pivotal Network.
Release Date: September 11, 2019
Features
See release note for v1.6.3
Known Issues
Flyway migration fails during upgrade
PCF Healthwatch v1.6.2 contains a bad flyway migration. This causes issues during upgrades from PCF Healthwatch v1.5. Due to this issue, PCF Healthwatch v1.6.2 is no longer available on Pivotal Network.
Install or upgrade to PCF Healthwatch v1.6.3 instead.
Reverse Log Proxy Loss Rate Alert Fires
Occasionally, the alert would fire, although the metric has been remove from Healthwatch v1.5 and above.
Infinite login redirect when using private domain suffixes
Certain private domain suffixes (eg, .local
or .a
) result an infinite redirect loop when trying to access the Healthwatch UI.
A workaround is to set the SKIP_CERT_VERIFY
environment variable to true
on the Healthwatch app. A bug fix is included in 1.6.4
.
For the canonical list of public suffixes, see https://publicsuffix.org/list/public_suffix_list.dat.
v1.6.1
Release Date: July 8, 2019
Features
New features and changes in this release:
- [Bug Fix] Fixes Reverse Log Proxy Egress Dropped Messages Graph Not Displaying.
- [Bug Fix] The
Log Transport Dropped Egress Messages
graph correctly displays when all metrics are0
.
Known Issues
This release has the following known issues.
Incorrect Upgrade Requirements
Tile metadata in PCF Healthwatch states that a user can upgrade directly from Healthwatch v1.3 to v1.6, but the statement is incorrect. To upgrade successfully to v1.6, you need PCF Healthwatch v1.5 or later.
The metadata statement is corrected in Healthwatch v1.6.2 and later.
Cell Health Check graph is not showing correctly
On the Compute Performance page, the Cell Health Check graph shows no data. Upgrading to Healthwatch 1.6.3+ fixes this issue.
Alerting on the underlying metric, rep.UnhealthyCell
, was unaffected.
Occasional inaccurate spikes in Log Transport Throughput graph
The graph might spike inaccurately when doing a deployment upgrade or when the loggregator system is overloaded.
Ineffectual “Redis Worker Count” property in tile configuration
Setting Redis Worker Count in the PCF Healthwatch tile does not change the number of instances of the healthwatch-worker
app.
This issue is fixed in Healthwatch v1.6.3 and later.
Disk Slowly Fills When Using vSAN with Healthwatch Leads
The vSAN object count increases on vSphere versions earlier than v6.5 update 2.
Healthwatch deploys the app bosh-health-check
, which deploys and deletes a VM every 10 minutes. vSphere versions earlier than v6.5 update 2 leave a namespace or folder and subfolders when a VM is deleted. The orphaned folders cause the vSAN object count to increase. This is a known issue for vSAN. For more information about the vSAN known issue, see Deleted VMs leave components behind in GitHub.
To address the issue, update vSphere to v6.5 update 2 or later. Or, stop the bosh-health-check
to slow down the increase in vSAN object count.
Indicator Protocol Beta Dashboard Displays Error Due to Log Cache
Occasionally, the Indicator Protocol Beta Dashboard charts will fail to load with the following error: "Error fetching graph data."
.
The Indicator Protocol Beta Dashboard charts are populated using data from Log Cache, which is a component of Loggregator. The charts may fail to load if Log Cache times out while processing the data.
No corrective action is required. The issue will self-resolve if possible.
Multiple healthwatch_space_developer
CF on Healthwatch re-install
A user is created during the push-apps
errand (which runs during tile installation) to be used during the cf-health-check
test. This user was not being deleted on tile deletion, so if the tile is re-installed the cf-health-check
fail because it’s using an invalid password for the pre-existing healthwatch_space_developer
user.
Reverse Log Proxy Loss Rate Alert Fires
Occasionally, the alert would fire, although the metric has been remove from Healthwatch v1.5 and above.
v1.6.0
Release Date: June 19, 2019
Features
New features and changes in this release:
- Updates Healthwatch Charts with new features:
- Drag a selection to zoom in.
- Double click to zoom out.
- More visibility around missing data.
- Legend with filters.
- Renames Log Transport Dropped Messages chart to Log Transport Dropped Ingress Messages. The Log Transport Dropped Ingress Messages chart graphs the
doppler.dropped, direction: ingress
metric. - Adds Log Transport Dropped Egress Messages chart on the Logging Performance page. This graphs the
doppler.dropped, direction: egress
metric. For more information about the Doppler Egress Dropped Messages KPI, see Doppler Egress Dropped Messages. - PCF Healthwatch apps connect to the internal MySQL database using TLS.
- Increases logging around failed Cloud Foundry Command Line Interface (cf CLI) tests in the
cf-health-check
app. - Binary log retention for internal MySQL database changed from 7 days to 2 days. This reduces the amount of persistent storage used by the VM.
- [Bug Fix] Fixes Log Transport Loss Rate alert markers rendering on multiple charts.
- [Bug Fix] Fixes Healthwatch Has Missing or Incorrect Data by more robustly determining the deployment tag.
- [Bug Fix] Diego Cell Capacity page graphs do not show false drops in capacity due to occasional late metric. Previously, if Diego emits a metric outside the standard one minute window, Diego Cell Capacity graphs show a false drop.
- [Bug Fix] Fixes Healthwatch Cannot Start if ‘0’ aliased certificate is present in indicator keystore.
- [Bug Fix] Fixes regression where
opsman-health-check
doesn’t work for self-signed Ops Manager certificate. - [Bug Fix] BOSH Director stoplight correctly turns red when
bosh-health-check
fails. - [Bug Fix] Correctly account for half-hour timezones in the Healthwatch UI.
[Bug Fix] If
healthwatch-ingestor
fails to receive data after 15 seconds, it will automatically reset its Spring Application Context to re-establish a Firehose connection.Maintenance update of the following dependencies:
- pxc-release now v0.15.0
- Golang now v1.12.5
- Indicator Protocol now v0.7.14
- Spring Boot now 2.1.5
- Flyway Command-line and Library now v5.2.4
- Redis now v3.2.13
- CF CLI now v6.45.0
- Libraries Updated:
- com.google.protobuf:protobuf-java now v3.7.1
- io.projectreactor.ipc:reactor-netty now v0.7.15.RELEASE
- react-markdown now v4.0.8
Known Issues
This release has the following known issues.
Incorrect Upgrade Requirements
Tile metadata in PCF Healthwatch states that a user can upgrade directly from Healthwatch v1.3 to v1.6, but the statement is incorrect. To upgrade successfully to v1.6, you need PCF Healthwatch v1.5 or later.
The metadata statement is corrected in Healthwatch v1.6.2 and later.
Occasional inaccurate spikes in Log Transport Throughput graph
The graph might spike inaccurately when doing a deployment upgrade or when the loggregator system is overloaded.
Ineffectual Redis Worker Count Property in Tile Configuration
Setting Redis Worker Count in the PCF Healthwatch tile does not change the number of instances of the healthwatch-worker
app.
This issue is fixed in Healthwatch v1.6.3.
Log Transport Dropped Egress Messages Graph Not Displaying
If all values for loggregator.doppler.dropped.egress
are 0, the Log Transport Dropped Egress Messages
graph will not display.
This issue is fixed in Healthwatch v1.6.1.
Reverse Log Proxy Egress Dropped Messages Graph Not Displaying
If there are no cf-syslog-drain
metrics emitted, the Reverse Log Proxy Egress Dropped Messages
graph will not display.
Disk Slowly Fills When Using vSAN with Healthwatch Leads
The vSAN object count increases on vSphere versions earlier than v6.5 update 2.
Healthwatch deploys the app bosh-health-check
, which deploys and deletes a VM every 10 minutes. vSphere versions earlier than v6.5 update 2 leave a namespace or folder and subfolders when the VM is deleted. The orphaned folders cause the vSAN object count to increase. This is a known issue for vSAN. For more information about the vSAN known issue, see Deleted VMs leave components behind in GitHub.
To address the issue, update vSphere to v6.5 update 2 or later. Or, you can stop the bosh-health-check
to slow down the increase in vSAN object count.
Indicator Protocol Beta Dashboard Displays Error Due to Log Cache
Occasionally, the Indicator Protocol Beta Dashboard charts will fail to load with the error: "Error fetching graph data."
.
These charts are populated using Log Cache, which is part of Loggregator and will fail periodically due to Log Cache timing out while attempting to process the data.
No corrective action is required and it will self-resolve if possible.
Multiple healthwatch_space_developer
CF on Healthwatch re-install
A user is created during the push-apps
errand (which runs during tile installation) to be used during the cf-health-check
test. This user was not being deleted on tile deletion, so if the tile is re-installed the cf-health-check
fail because it’s using an invalid password for the pre-existing healthwatch_space_developer
user.
Reverse Log Proxy Loss Rate Alert Fires
Occasionally, the alert would fire, although the metric has been remove from Healthwatch v1.5 and above.