LATEST VERSION: 1.4 - RELEASE NOTES
PCF Healthwatch v1.1

PCF Healthwatch Architecture

This topic describes the architecture of Pivotal Cloud Foundry (PCF) Healthwatch.

PCF Healthwatch Components

The diagram below shows the architecture of PCF Healthwatch, including the PCF components that PCF Healthwatch interacts with.

Healthwatch architecture

View a larger version of this diagram.

PCF Healthwatch deploys several apps as part of its installation process. These apps are responsible for creating the service UI and supporting functional health checks.

How Data Flows Through PCF Healthwatch

Data flows through PCF Healthwatch as follows:

  1. In PCF, all platform metrics are forwarded to the Loggregator Firehose by default.

  2. The PCF Healthwatch Ingestor app consumes the platform metrics from the Firehose.

  3. The Ingestor forwards the platform metrics to Redis, which acts as a buffer.

  4. The Worker app consumes raw data from Redis, aggregates it, and writes transformed data to the MySQL datastore.

  5. The transformed data remains available in the MySQL datastore until it is purged.

How Product-Created Metrics Flow Through PCF Healthwatch

PCF Healthwatch also creates additional metrics of operational value and stores them in the super_value_metric table in the datastore. For more information, see PCF Healthwatch Metrics. These product-created platform metrics take two paths through the system: Contextual Assessments and Functional Apps.

Contextual Assessments

Contextual Assessments are derived from platform-emitted data, for example, Syslog Drain Binding Capacity. PCF Healthwatch v1.2 handles this data as follows:

  1. The Aggregator app makes additional transformations to the data.
  2. The Aggregator app forwards the data to the Metron Forwarder.
  3. The Metron Forwarder writes the data to the MySQL datastore and also forwards it back into the Firehose for external consumers to use.

    Note: PCF Healthwatch forwards back into the Firehose only these additional metrics. The service does not forward platform metrics that are already available to the Firehose consumers.

  4. The transformed data remains available in the MySQL datastore until it is purged.

All ingested and service-created data points are stored in the datastore for 25 hours and then pruned.

Functional Apps

Functional Apps execute Health and Uptime tests, for example, CLI Command Health. PCF Healthwatch v1.2 handles this data as follows:

  1. Functional Apps forward their data to the Metron Forwarder.
  2. The Metron Forwarder writes that data to the MySQL datastore and also forwards it back into the Firehose for external consumers to use. This data is then available until it is purged.

    Note: PCF Healthwatch forwards back into the Firehose only these additional metrics. The service does not forward platform metrics that are already available to the Firehose consumers.

All ingested and service-created data points are stored in the datastore for 25 hours and then pruned.

How PCF Healthwatch Adjusts Data Flow for Higher Availability

When a singleton component becomes temporarily unavailable, such as when a VM restarts when a new stemcell is applied, PCF Healthwatch can adjust its data flow to provide higher availability. This adjusted data flow process is described below.

  • If Redis is temporarily unavailable, Firehose-based data buffers in the Ingestor until Redis becomes available.

  • If MySQL is temporarily unavailable, Firehose-based data queues in Redis, and data generated by PCF Healthwatch queues in the Metron Forwarder until MySQL becomes available.

How Platform-Emitted Data is Aggregated by PCF Healthwatch

In v1.2, all Firehose-emitted platform metrics that PCF Healthwatch ingests are aggregated according to pre-defined rules before being written to the datastore. This helps avoid the cost of storing raw data, and in the case of gauge values, can add additional points of interest to the data.

  • Counter metrics: Maximum counter value received for the one-minute aggregation window, from which a minute-to-minute rate is later derived. Unique to the metric name and to the individual metric emitter, per instance, as applicable.

  • Gauge metrics: Received values for the one-minute aggregation window, aggregated and stored with the following five calculated values per metric: avg, min, max, med, 95p. Unique to the metric name and to the individual emitter, per instance, as applicable.

Functional Apps Created by PCF Healthwatch

PCF Healthwatch creates the following Functional Apps:

  • BOSH Director health check: the bosh-health-check app
  • BOSH deployment task check: the bosh-task-check app
  • CLI command health check: the cf-health-check app
  • Canary app uptime and response check: the canaryapp-health-check app
  • Ops Manager uptime check: the opsmanager-health-check app
  • PCF Healthwatch self-monitor and report: the healthwatch-meta-monitor app
  • PCF Healthwatch event monitoring and alert publishing: the healthwatch-alerts app

For information about scaling these PCF Healthwatch resources, see PCF Healthwatch Resources.

Required Networking Rules for PCF Healthwatch

Prior to deploying PCF Healthwatch, the operator must verify the network configuration necessary to allow PCF Healthwatch components to communicate with each other and certain PCF components. The following table lists the PCF components that PCF Healthwatch needs to connect to, and why.

Key PCF Components Why PCF Healthwatch Needs Access
BOSH Director Information about BOSH Director health and executing IaaS health checks
BOSH UAA Authorization to access the Director
UAA Authorization for component metrics, CF Health Checks and PCF Healthwatch UI
Cloud Controller CF Health Checks
Doppler Metric Ingestion, Forwarding Metrics to the Firehose

The following table lists the communication paths and ports between PCF Healthwatch components and other PCF Healthwatch and PCF components.

This Healthwatch component… Must communicate with… Default TCP Port Communication direction(s) Notes
bosh-health-check
  • BOSH Director
  • BOSH UAA
  • Healthwatch MySQL VM
  • Healthwatch Forwarder VMs
  • 25555
  • 8443
  • 3306
  • 13322
One way
bosh-task-check
  • BOSH Director
  • BOSH UAA
  • Healthwatch MySQL VM
  • Healthwatch Forwarder VMs
  • 25555
  • 8443
  • 3306
  • 13322
One way
canary-health-check
  • Canary App via external route
  • Healthwatch MySQL VM
  • Healthwatch Forwarder VMs
  • 443 or 80
  • 3306
  • 13322
One way
cf-health-check
  • CF CLI Access
  • Doppler
  • UAA
  • Healthwatch MySQL VM
  • Healthwatch Forwarder VMs
  • 443
  • 443
  • 443
  • 3306
  • 13322
One way CF CLI interactions. On AWS, Doppler connection is typically port 4443.
healthwatch
  • UAA
  • Healthwatch MySQL VM
  • 443
  • 3306
One way
healthwatch-aggregator
  • Healthwatch MySQL VM
  • Healthwatch Forwarder VMs
  • 3306
  • 13322
One way
healthwatch-alerts
  • Healthwatch MySQL VM
  • Healthwatch Forwarder VMs
  • 3306
  • 13322
One way
healthwatch-api
  • Healthwatch MySQL VM
  • Healthwatch Forwarder VMs
  • 3306
  • 13322
One way
healthwatch-ingestor
  • Healthwatch Redis VM
  • Loggregator Firehose
  • Healthwatch MySQL VM
  • Healthwatch Forwarder VMs
  • 6379
  • 443
  • 3306
  • 13322
One way
healthwatch-worker
  • Healthwatch MySQL VM
  • Healthwatch Redis VM
  • Healthwatch Forwarder VMs
  • 3306
  • 6379
  • 13322
One way
healthwatch-meta-monitor
  • Healthwatch MySQL VM
  • Healthwatch Forwarder VMs
  • 3306
  • 13322
One way
opsmanager-health-check
  • Ops Manager VM
  • Healthwatch MySQL VM
  • Healthwatch Forwarder VMs
  • 443
  • 3306
  • 13322
One way
ui-health-check
  • healthwatch via GoRouter
  • Healthwatch MySQL VM
  • Healthwatch Forwarder VMs
  • 443
  • 3306
  • 13322
One way
Healthwatch Forwarder VM
  • Doppler via Metron Agent
  • Healthwatch MySQL VM
  • 8082
  • 3306
One way
Healthwatch MySQL VM No outbound connections
Healthwatch Redis VM No outbound connections

Note: If you configure syslog forwarding for PCF Healthwatch then you will need to ensure that network path from each VM as well

Note: PCF Healthwatch depends on the bosh-system-metrics-forwarder component in PAS. For that to work the trafficcontroller VM needs to communicate to the BOSH Director on port 25595

Create a pull request or raise an issue on the source for this page in GitHub