LATEST VERSION: 1.1 - CHANGELOG
PCF Healthwatch v1.1

Release Notes for PCF Healthwatch

v1.1.4

Release Date: February 8, 2018

Release Notes

  • [Bug Fix] Switches the servlet container from Tomcat to Jetty. This resolves reported issues with Healthwatch installation failing on a non-RFC 1918 network.
  • [Bug Fix] Switches the underlying push-apps errand script from bash script to kotlin. This is expected to resolve issues on some Azure installations where the CLI was timing out before the Healthwatch push-apps errand could complete successfully.
  • [Bug Fix] Fixes an issue identified in v1.1.3 where the Logging Throughput and Loss Rate calculations were potentially underreporting.
  • [Bug Fix] Fixes an issue identified in v1.1.3 and earlier where the CF CLI credentials were visible in the push_apps script logs.
  • [Bug Fix] The CLI Command Health Check app was not declaring the amount of memory available to it, therefore relying on the system default. In some environments this could result in too low of memory available to successfully start the app. Other packaged test apps did already declare memory needed. To resolve, the CLI Command Health Check app now explicitly declares a 1GB memory allocation on push.
  • [Feature] The existing manifest capability to Disable Ops Manager Continuous Validation Testing has been exposed as a configuration property within the Ops Manager UI for the Healthwatch tile configuration. This enable/disable choice is available on the Health Check configuration screen within the Healthwatch tile settings screen. The default value is Enable.
    • Note: If you had previously turned this test off within the manifest prior to v1.1.4, please validate your setting is Disable before applying changes to upgrade to this release version.
  • [Feature] Healthwatch now creates and publishes 3 additional metrics regarding Capacity Available. These are useful for downstream consumers wanting to monitor against a given available capacity value, instead of, or in compliment to, the percentage-based available capacity metrics already published.
  • [Feature] The main dashboard will now display the Foundation name. This displayed name will be the name configured, or the system domain if this default value was not updated.
  • [Security Feature] As Operators are allowed to define the name of the foundation, which is then published into the firehose as a tag on the Healthwatch emitted metrics, a sanitization method has been added to Metron Forwarder so that disallowed characters that could be problematic for other downstream firehose consumers cannot be published. Any disallowed characters are stripped from the passed foundation label value.
  • [Feature] UI Improvements:
    • The copy to clipboard user interaction was improved throughout the UI. Now Copied will display briefly when the copy icon is clicked.
    • Minor design update made to the layout of the Jobs detail page in order to improve overall readability of the information presented.
    • On the Job Instances detail page, the y-axis is now fixed 0-100% for the bosh metric line graphs. This makes this page consistent with behavior on other product pages, and better emphasis low vs high percentages when scanning across multiple charts.
    • On the test result detail pages for CLI Command Health, Canary App Health, BOSH Director Health, and Ops Manager Health, the end time for a particular test run is now displayed in the details table in UTC. By displaying this information-only timestamp in UTC, it is easier to leverage the information when searching through relevant logs. The primary UI interactions on these pages remains as-is, displaying in the user’s local time.
    • On the test result detail pages for CLI Command Health, Canary App Health, BOSH Director Health, and Ops Manager Health, the detailed test result table has been visually adjusted so that the information no longer needs to be truncated, and is easier to read. Test Results are now represented by a Pass/Fail/Didn’t Run/No Data icon, with a hover interaction available to confirm icon meaning.
    • Improvement made to queries updating the Capacity panel on the main dashboard. This panel could sometimes show an unexpected line drop in the final minute, although the details page already had the most recent value correctly displayed. This update reduces the likelihood of that behavior.
    • We have removed the previously stated limitation that the Google Chrome browser must be used for accessing the UI. The latest Mozilla Firefox browser also works well. Microsoft Edge 16 has one issue with broken tab switch navigation on the Capacity and Diego detail pages that looks to be resolved in the upcoming Edge 17.

Known Issues

PCF Healthwatch v1.1.4:

  • Does not cover monitoring of Isolation Segments.
  • Does not include the recently published UAA KPI recommendations for PCF 2.0
  • Hides three of the PAS MySQL KPI charts. These charts will be available in a future patch version:
    • Query Rate
    • MySQL CPU Busy Time
    • Percentage of Max Connections Used

v1.1.3

Release Date: January 11, 2018

Release Notes

  • [Feature] PCF Healthwatch is now also compatible with the PCF Small Footprint PAS tile
  • [Feature] Operators can now choose to change the default Foundation name that PCF Healthwatch passes into the firehose as part of the publication of the PCF Healthwatch Metrics.
    • Operators can optionally configure this name within the tile. Doing so will replace the default foundation name value of system domain. This updated foundation name is passed into the Firehose as a key-value tag. For example:
      origin:"healthwatch" eventType:ValueMetric timestamp:1515598485276671703 deployment:"cf" job:"healthwatch-forwarder" index:"07a5b686-ef82-4dd0-6413-466b" ip:"10.0.16.6" tags:<key:"foundation" value:"production-1" > valueMetric:<name:"SyslogDrain.Adapter.LossRate.1M" value:0 unit:"m" >
      origin:"healthwatch" eventType:ValueMetric timestamp:1515598485279815606 deployment:"cf" job:"healthwatch-forwarder" index:"07a5b686-ef82-4dd0-6413-466b" ip:"10.0.16.6" tags:<key:"foundation" value:"production-1" > valueMetric:<name:"SyslogDrain.RLP.LossRate.1M" value:0 unit:"m" >
      
  • [Feature] The error page is now more descriptive when the login error is the result of an invalid scope.
    • Users that receive the error message Error: User missing required scopes. when attempting to access the PCF Healthwatch UI will need to have the correct healthwatch.read scope added to their UAA user account.
  • [Feature] When using the copy+paste interaction on an unhealthy job, the job name will also now be copied to clipboard. Having job/vm-id in the clipboard provides more useful pasting into the bosh2 cli.
  • [Bug Fix] Fixes an issue identified in v1.1.1 where the full root ca certificate was visible in the log output
  • Product stemcell was updated to v3468

Known Issues

PCF Healthwatch v1.1.3:

  • [Bug Identified] The CF CLI credentials are visible in the push_apps script logs. This is fixed in v1.1.4.
  • [Bug Identified] An update to origin-based queries introduced in v1.1.3 caused a potential calculation issue to occur for the Logging Throughput and Logging Loss Rate calculations, as these can compute from multiple metrics with differing origins. This is resolved in v1.1.4.
  • Does not cover monitoring of Isolation Segments.
  • Supports only the Google Chrome browser when accessing the PCF Healthwatch UI.
  • Hides three of the PAS MySQL KPI charts. These charts will be available in a future patch version:
    • Query Rate
    • MySQL CPU Busy Time
    • Percentage of Max Connections Used

v1.1.1

Release Date: December 22, 2017

Release Notes

  • [Feature] To support monitoring of Pivotal Cloud Foundry (PCF) v2.0, the following functionality has been added to PCF Healthwatch:
  • [Feature] Metrics published by PCF Healthwatch on a given PCF foundation are now identifiable to that foundation.
    • If PCF Healthwatch is installed on multiple foundations within a multi-foundation environment, the metrics PCF Healthwatch publishes are identifiable to their source PCF foundation. This enables operators who are aggregating data streams from multiple foundations to more easily recognize which foundation the PCF Healthwatch metrics of concern originated from.
    • By default, the value provided for a PCF foundation is the system domain of that foundation. The foundation value is passed into the Firehose as a key-value tag. For example:
      origin:"healthwatch" eventType:ValueMetric timestamp:1511211281010702574 deployment:"cf" job:"healthwatch-forwarder" index:"dbf89280-1b6b-46c7-4255-aaad" ip:"10.0.16.29" tags:<key:"foundation" value:"pcf.downey.cfapps.com" > valueMetric:<name:"health.check.OpsMan.probe.count" value:1 unit:"count" >
      origin:"healthwatch" eventType:ValueMetric timestamp:1511211286171879726 deployment:"cf" job:"healthwatch-forwarder" index:"05a557a0-0e38-4298-6adb-278d" ip:"10.0.16.29" tags:<key:"foundation" value:"pcf.downey.cfapps.com" > valueMetric:<name:"health.check.bosh.director.probe.available" value:1 unit:"Metric" >
      
  • [Feature] PCF Healthwatch now publishes operational metrics about itself so that its functionality and performance can also be monitored.

    For more information, see Monitoring PCF Healthwatch.

  • [Feature] Operators installing or upgrading PCF Healthwatch can now configure the desired number of Health Checkers in the Healthwatch Component Config section of the PCF Healthwatch tile.

  • [Feature] Operators who do not use Ops Manager for deployments can now turn off the default Ops Manager test suite. For more information, see Installing and Configuring PCF Healthwatch.

  • [Feature] UI Improvements:

    • The PCF Healthwatch dashboard has a new six-column default layout. If the width of your display is 1835 pixels or fewer, the dashboard shows three columns; you can resize them manually in the browser.
    • When an unhealthy job is flagged and becomes visible on the PCF Healthwatch dashboard, you can now click on that job name to go directly to the Job Instances Detail page for that specific job.
    • Tooltip interactions and handling of long deployment names was improved.
    • Breadcrumb navigation was added.
    • Panel titles now link to detail view pages.

Known Issues

PCF Healthwatch v1.1.1:

  • [Bug Identified] The CF CLI credentials are visible in the push_apps script logs. This is fixed in v1.1.4.
  • [Bug Identified] The full root ca certificate is visible in the log output
  • Is not compatible with the PCF Small Footprint PAS tile.
  • Does not cover monitoring of Isolation Segments.
  • Supports only the Google Chrome browser when accessing the PCF Healthwatch UI.
  • Hides three of the PAS MySQL KPI charts at launch. These charts will be available in a future patch version:
    • Query Rate
    • MySQL CPU Busy Time
    • Percentage of Max Connections Used

What’s New in PCF Healthwatch v1.1

PCF Healthwatch v1.0 was available as a limited, closed-BETA release. The section below summarizes key differences between PCF Healthwatch v1.0 and v1.1. For more information about new features in v1.1, see v1.1.0 release notes.

Key Differences

  • [Feature] Manual plugin configurations are no longer required to ingest BOSH metrics into PCF Healthwatch. Use of the prior plugins should be eliminated upon switch to v1.1.
    • Smoke tests now fail on lack of BOSH metrics.
  • Naming convention changes:
    • The healthwatch.health.check.AppsMan.available metric is now healthwatch.health.check.CanaryApp.available.
    • The healthwatch.health.check.AppsMan.responseTime metric is now healthwatch.health.check.CanaryApp.responseTime.
    • The data loader app deployed at installation was renamed from mysql-logqueue to loader.
  • Data convention change: In PCF Healthwatch v1.0, the default deployment value for all Healthwatch-created metrics was p-healthwatch. In PCF Healthwatch v1.1, the deployment value is the actual deployment value the metrics were created from. This is a necessary data structure change to prepare for the future capability of monitoring isolation segments. The origin of all Healthwatch-created metrics remains healthwatch.
  • Default installation configuration change:
    • Ingestor instance count now defaults to 4
    • MySQL Loader instance count now defaults to 4
  • [Feature] New Syslog Drain Binding Capacity metric represents the average number of drain bindings across Adapter instances. The Logging Performance page now displays this capacity ratio as an indicator for scaling Syslog Drain Adapters. This chart replaces the informational Count of Bindings chart used in v1.0.
  • [Feature] The following Router graphs are now multi-line so that the performance of the individual instances can be better represented:
    • Router Throughput
    • 502 Bad Gateways
    • All 5XX Errors
    • Number of Routes Registered
  • [Feature] New count of the Available Free Chunks metric is now available within the PCF Healthwatch datastore and is being forwarded into the Firehose for external consumption.
  • [Feature] PCF Healthwatch v1.1 uses the new Ops Manager feature for supporting colocated errands.
  • [Feature] PCF Healthwatch v1.1 is updated to reflect the Key Performance Indicators changes for PCF v2.0.
  • [Bug Fix] In v1.0, the Running App Instances stoplight would continue to show a data value during a complete disconnection from the firehose data stream, if there had previously been valid data received. This has been corrected. In an scenario where the product suffers a complete loss of new data for more than 5 minutes, the stoplight will now display 0.
  • Product stemcell was updated to v3445.
  • MySQL version was updated to v36.10.0.
  • In PCF 2.0, Elastic Runtime (ERT) was renamed to Pivotal Application Services (PAS). All help text references to ERT have been updated to PAS.
Create a pull request or raise an issue on the source for this page in GitHub