Healthwatch Release Notes

Warning: The Healthwatch tiles are currently in beta and are intended for evaluation and test purposes only. Do not use this product in a production environment.

v2.0.3 (beta)

Release Date: July 10, 2020

Breaking Changes:
  • The server name of all the exporters are changed to hwexporter. If you are setting up scrape config to those exporters, the scrape config needs to be updated.
  • The cert expiration exporter is now moved to TAS for VMs and TKGi Exporter tiles, instead of in the Healthwatch tile. If you are setting up scrape config to those exporters, the scrape config needs to be updated.

Features

New features and changes in this release:

  • Grafana 7.0

  • Downsize the exporter VMs to better utilize the resource.

Security Fixes

This release includes the following security fix:

  • Remove extraneous logging in certain components that would log sensitive information under certain configuration.

Resolved Issues

This release includes the following fixes:

  • Healthwatch Exporter for PKS now includes metrics to populate BOSH Director Health dashboard properly.

  • Fix a query on Tanzu Application Service SLO dashboard.

  • Healthwatch SLO populate correct exporter metrics depends on whether TAS for VMs or TKGi dashboard selected.

  • Various dashboard fixes.

Known Issues

  • When Healthwatch is installed with TAS for VMs versions v2.7.17 and earlier patches, v2.8.11 and earlier patches, or v2.9.5 and earlier patches, the Usage Service Dashboard will not be populated correctly and will trigger alerts tagged with usage_service_app_usage_event_cc_lag_seconds. One workaround is to pause reporting these alerts by opening Grafana -> Alerting -> Alert rules, and pause the corresponding alerts.

  • When PKS cluster discovery is used with PKS deployed on vSphere NSX-T, the load balancer created by NSX-T does not have all the ports open that are required for PKS Cluster Discovery to scrape from on demand K8s clusters correctly. One workaround is to manually modify the load balancer through the NSX-T Manager API to open up these ports: 10200, 10251, 10252, and 8443.

    To open the ports:

    1. Fetch the list of virtual servers:

      curl -u 'NSX-T-USERNAME:NSX-T-PASSWORD' \
        "https://NSX-MGR/api/v1/loadbalancer/virtual-servers" | jq .`
      

      Where NSX-T-USERNAME:NSX-T-PASSWORD is the username and password for logging into the NSX-T console.

    2. In the output, look for the JSON array item that has the display_name starting with lb-pks and ending with virtual-server and copy the id field.

    3. Fetch the current configuration for the load balancer:

      curl -u 'NSX-T-USERNAME:NSX-T-PASSWORD' \
        https://NSX-MGR/api/v1/loadbalancer/virtual-servers/VIRTUAL-SERVER-UUID`
      

      Where:

      • NSX-MGR is the IP/FQDN of your NSX-T console.
      • VIRTUAL-SERVER-UUID is the unique ID that identifies the Load Balancer.
    4. Save the JSON that is returned by this command to a file.

    5. Modify that JSON to include the additional ports:

      {
          "...": "...",
          "ports": [
              "8443",
              "10200",
              "10251",
              "10252"
          ],
          "...": "..."
      }
      
    6. Do a PUT to the API to update the virtual server:

      curl -X PUT -u 'NSX-T-USERNAME:NSX-T-PASSWORD' \
        https://NSX-MGR/api/v1/loadbalancer/virtual-servers/VIRTUAL-SERVER-UUID \
        -H 'X-Allow-Overwrite: true' -H 'Content-type: Application/json' \
        -d 'MODIFIED-JSON-DATA'
      

      Where MODIFIED-JSON-DATA contains the additional ports you want to add.

  • When PKS cluster discovery is used with PKS deployed on vSphere NSX-T, there is a load balancer created by NSX-T, and that obscures the fact that there are multiple k8s masters. It will be lower than the actual count, depending on how many load balancers there are.

  • When upgrading the Healthwatch v2.0 tile on a foundation that has been upgraded from Ops Manager v2.3 or earlier, you may see the following error:

    - Unable to render templates for job 'opsman-cert-expiration-exporter'. Errors are:
      - Error filling in template 'bpm.yml.erb' (line 9: Can't find property '["opsman_access_credentials.uaa_client_secret"]')
    

    This issue is resolved in Ops Manager v2.8. To resolve this issue in Ops Manager v2.7 or earlier:

    1. SSH into your Ops Manager VM.
    2. Change the user to root.
    3. Open the Rails console by running:

      > cd /home/tempest-web/tempest/web; RAILS_ENV='production' TEMPEST_INFRASTRUCTURE='INFRASTRUCTURE' TEMPEST_WEB_DIR='/home/tempest-web' SECRET_KEY_BASE='1234' DATA_ROOT='/var/tempest' LOG_DIR='/var/log/opsmanager' su tempest-web --command 'bundle exec rails console'
      

      Where INFRASTRUCTURE is either google, aws, azure, vsphere, or openstack.

    4. Set the decryption passphrase by running:

      irb(main):001:0> EncryptionKey.instance.passphrase = 'DECRYPTION-PASSPHRASE'
      

      Where DECRYPTION-PASSPHRASE is the correct decryption passphrase.

    5. Update the UAA restricted view access client secret by running:

      irb(main):001:0> Uaa::UaaConfig.instance.update_attributes(restricted_view_api_access_client_secret: SecureRandom.hex)
      
    6. Exit the Rails console and restart the “tempest-web” service by running:

      irb(main):001:0> exit
      > service tempest-web restart
      

v2.0.2 (beta)

Release Date: June 1, 2020

Features

New features and changes in this release:

  • All Jobs and Job Details dashboards ignores BOSH smoke test deployments.

  • Add new PKS dashboards. To see the dashboards, select a PKS Version to Monitor in the Grafana Configuration.

  • Add additional functional test for PKS API, the change can be found in PKS Control Plane dashboard.

  • Update query for cf push remaining error budget.

  • Allow configuring additional cipher suite in the Grafana Configuration.

  • PKS cluster discovery now support OIDC, PKS UAA Admin Password property is required when Configure created clusters to use UAA as the OIDC provider in Enterprise PKS tile is enabled.

Security Fixes

This release includes the following security fixes:

  • Only support strong cipher suites in Grafana. The list of the OOTB cipher suite can be found here.

  • Remove extraneous logging in certain components that would log sensitive information under certain configuration.

Resolved Issues

This release includes the following fix:

  • PKS cluster discovery PKS API Skip SSL Validation option properly skip ssl validation.

  • Fix the line chart in CLI Health so it stays 0 instead of oscillating when the underlying metric keeps failing.

  • Fix the cf cli functional test so it fails properly when unable to get logs for the deployed smoke test app.

  • All Jobs and Job Details page use correct query tags when deployed to PKS foundation.

  • Indicator Protocol integration properly renders the chart panel name.

Known Issues

  • When upgrading the Healthwatch v2.0 tile on a foundation that has been upgraded from Ops Manager v2.3 or earlier, you may see the following error:

    - Unable to render templates for job 'opsman-cert-expiration-exporter'. Errors are:
      - Error filling in template 'bpm.yml.erb' (line 9: Can't find property '["opsman_access_credentials.uaa_client_secret"]')
    

    This issue is resolved in Ops Manager v2.8. To resolve this issue in Ops Manager v2.7 or earlier:

    1. SSH into your Ops Manager VM.
    2. Change the user to root.
    3. Open the Rails console by running:

      > cd /home/tempest-web/tempest/web; RAILS_ENV='production' TEMPEST_INFRASTRUCTURE='INFRASTRUCTURE' TEMPEST_WEB_DIR='/home/tempest-web' SECRET_KEY_BASE='1234' DATA_ROOT='/var/tempest' LOG_DIR='/var/log/opsmanager' su tempest-web --command 'bundle exec rails console'
      

      Where INFRASTRUCTURE is either google, aws, azure, vsphere, or openstack.

    4. Set the decryption passphrase by running:

      irb(main):001:0> EncryptionKey.instance.passphrase = 'DECRYPTION-PASSPHRASE'
      

      Where DECRYPTION-PASSPHRASE is the correct decryption passphrase.

    5. Update the UAA restricted view access client secret by running:

      irb(main):001:0> Uaa::UaaConfig.instance.update_attributes(restricted_view_api_access_client_secret: SecureRandom.hex)
      
    6. Exit the Rails console and restart the “tempest-web” service by running:

      irb(main):001:0> exit
      > service tempest-web restart
      

v2.0.1 (beta)

Release Date: April 29, 2020

Features

New features and changes in this release:

  • Exposes composite metrics, or “Super Value Metrics”, from Healthwatch v1.x to the metrics pipeline. This enables you to still see the same metrics from Healthwatch v1.x in your downstream log consumers if you upgrade from Healthwatch v1.x to Healthwatch v2.x.

  • Includes a new Cert Expiration dashboard under the Foundation tab that displays the time left until expiration for all discoverable certificates from the Ops Manager API. In Ops Manager v2.7 and later, the Ops Manager API includes the Ops Manager root CA and product certificates stored in CredHub.

  • Diego Capacity metrics are separated by isolation segment in the Diego/Capacity dashboard.

  • Enables a proxy setting for Grafana. This allows you to configure alerting in air-gapped environments.

  • Exposes Ops Manager syslog configuration.

Breaking Change: Grafana is configured to use the default HTTP and HTTPS ports, so you do not have to specify a port when visiting your dashboards. If a load balancer is placed in front of Grafana instances, the back end configuration of the load balancer must be updated.

Security Fixes

This release includes the following security fixes:

  • N/A

Resolved Issues

This release includes the following fix:

  • N/A

Known Issues

  • When upgrading the Healthwatch v2.0 tile on a foundation that has been upgraded from Ops Manager v2.3 or earlier, you may see the following error:

    - Unable to render templates for job 'opsman-cert-expiration-exporter'. Errors are:
      - Error filling in template 'bpm.yml.erb' (line 9: Can't find property '["opsman_access_credentials.uaa_client_secret"]')
    

    This issue is resolved in Ops Manager v2.8. To resolve this issue in Ops Manager v2.7 or earlier:

    1. SSH into your Ops Manager VM.
    2. Change the user to root.
    3. Open the Rails console by running:

      > cd /home/tempest-web/tempest/web; RAILS_ENV='production' TEMPEST_INFRASTRUCTURE='INFRASTRUCTURE' TEMPEST_WEB_DIR='/home/tempest-web' SECRET_KEY_BASE='1234' DATA_ROOT='/var/tempest' LOG_DIR='/var/log/opsmanager' su tempest-web --command 'bundle exec rails console'
      

      Where INFRASTRUCTURE is either google, aws, azure, vsphere, or openstack.

    4. Set the decryption passphrase by running:

      irb(main):001:0> EncryptionKey.instance.passphrase = 'DECRYPTION-PASSPHRASE'
      

      Where DECRYPTION-PASSPHRASE is the correct decryption passphrase.

    5. Update the UAA restricted view access client secret by running:

      irb(main):001:0> Uaa::UaaConfig.instance.update_attributes(restricted_view_api_access_client_secret: SecureRandom.hex)
      
    6. Exit the Rails console and restart the “tempest-web” service by running:

      irb(main):001:0> exit
      > service tempest-web restart