PCF Healthwatch Architecture
Warning: PCF Healthwatch v1.4 is no longer supported or available for download. PCF Healthwatch v1.4 has reached the End of General Support (EOGS) phase as defined by the Support Lifecycle Policy. To stay up to date with the latest software and security updates, upgrade to a supported version.
This topic describes the architecture of Pivotal Cloud Foundry (PCF) Healthwatch.
PCF Healthwatch Components
The diagram below shows the architecture of PCF Healthwatch, including the PCF components that PCF Healthwatch interacts with.
View a larger version of this diagram.
PCF Healthwatch deploys several apps as part of its installation process. These apps are responsible for creating the service UI and supporting functional health checks.
How Data Flows Through PCF Healthwatch
Data flows through PCF Healthwatch as follows:
In PCF, all platform metrics are forwarded to the Loggregator Firehose by default.
The PCF Healthwatch Ingestor app consumes the platform metrics from the Firehose.
The Ingestor forwards the platform metrics to Redis, which acts as a buffer.
The Worker app consumes raw data from Redis, aggregates it, and writes transformed data to the MySQL datastore.
The transformed data remains available in the MySQL datastore until it is purged.
How Product-Created Metrics Flow Through PCF Healthwatch
PCF Healthwatch also creates additional metrics of operational value and stores them in the super_value_metric
table in the datastore. For more information, see PCF Healthwatch Metrics. These product-created platform metrics take two paths through the system: Contextual Assessments and Functional Apps.
Contextual Assessments
Contextual Assessments are derived from platform-emitted data, for example, Syslog Drain Binding Capacity. PCF Healthwatch handles this data as follows:
- The Aggregator app makes additional transformations to the data.
- The Aggregator app forwards the data to the Metron Forwarder.
- The Metron Forwarder writes the data to the MySQL datastore and also forwards it back into the Firehose for external consumers to use.
Note: PCF Healthwatch forwards back into the Firehose only these additional metrics. The service does not forward platform metrics that are already available to the Firehose consumers.
- The transformed data remains available in the MySQL datastore until it is purged.
All ingested and service-created data points are stored in the datastore for 25 hours and then pruned.
Functional Apps
Functional Apps execute Health and Uptime tests, for example, CLI Command Health. PCF Healthwatch handles this data as follows:
- Functional Apps forward their data to the Metron Forwarder.
- The Metron Forwarder writes that data to the MySQL datastore and also forwards it back into the Firehose for external consumers to use. This data is then available until it is purged.
Note: PCF Healthwatch forwards back into the Firehose only these additional metrics. The service does not forward platform metrics that are already available to the Firehose consumers.
All ingested and service-created data points are stored in the datastore for 25 hours and then pruned.
How PCF Healthwatch Adjusts Data Flow for Higher Availability
When a singleton component becomes temporarily unavailable, such as when a VM restarts when a new stemcell is applied, PCF Healthwatch can adjust its data flow to provide higher availability. This adjusted data flow process is described below.
If Redis is temporarily unavailable, Firehose-based data buffers in the Ingestor until Redis becomes available.
If MySQL is temporarily unavailable, Firehose-based data queues in Redis, and data generated by PCF Healthwatch queues in the Metron Forwarder until MySQL becomes available.
How Platform-Emitted Data is Aggregated by PCF Healthwatch
All Firehose-emitted platform metrics that PCF Healthwatch ingests are aggregated according to pre-defined rules before being written to the datastore. This helps avoid the cost of storing raw data, and in the case of gauge values, can add additional points of interest to the data.
Counter metrics: Maximum counter value received for the one-minute aggregation window, from which a minute-to-minute rate is later derived. Unique to the metric name and to the individual metric emitter, per instance, as applicable.
Gauge metrics: Received values for the one-minute aggregation window, aggregated and stored with the following five calculated values per metric:
avg
,min
,max
,med
,95p
. Unique to the metric name and to the individual emitter, per instance, as applicable.
Functional Apps Created by PCF Healthwatch
PCF Healthwatch creates the following Functional Apps:
- BOSH Director health check: the
bosh-health-check
app - BOSH deployment task check: the
bosh-task-check
app - CLI command health check: the
cf-health-check
app - Canary app uptime and response check: the
canaryapp-health-check
app - Ops Manager uptime check: the
opsmanager-health-check
app - PCF Healthwatch self-monitor and report: the
healthwatch-meta-monitor
app - PCF Healthwatch event monitoring and alert publishing: the
healthwatch-alerts
app
For information about scaling these PCF Healthwatch resources, see PCF Healthwatch Resources.
CF CLI Test Run Requirements
The tests attempt to execute the rest of the suite. For example, a push
failure stops the whole suite, but failure to receive logs should not stop the stop
or delete
tests.
CLI Command | Dependent Tests |
---|---|
Login | Push, Start, Logs, Stop, Delete |
Push | Start, Logs, Stop, Delete |
Start | Logs, Stop |
Logs. | None |
Stop | None |
Delete | None |
Required Networking Rules for PCF Healthwatch
Prior to deploying PCF Healthwatch, the operator must verify the network configuration necessary to allow PCF Healthwatch components to communicate with each other and certain PCF components. The following table lists the PCF components that PCF Healthwatch needs to connect to, and why.
Key PCF Components | Why PCF Healthwatch Needs Access |
---|---|
BOSH Director | Information about BOSH Director health and executing IaaS health checks |
BOSH UAA | Authorization to access the Director |
UAA | Authorization for component metrics, CF Health Checks and PCF Healthwatch UI |
Cloud Controller | CF Health Checks |
Doppler | Metric Ingestion, Forwarding Metrics to the Firehose |
The following table lists the communication paths and ports between PCF Healthwatch components and other PCF Healthwatch and PCF components.
This Healthwatch component… | Must communicate with… | Default TCP Port | Communication direction(s) | Notes |
---|---|---|---|---|
bosh-health-check |
|
|
One way | |
bosh-task-check |
|
|
One way | |
canary-health-check |
|
|
One way | |
cf-health-check |
|
|
One way | CF CLI interactions. On AWS, Doppler connection is typically port 4443. |
healthwatch |
|
|
One way | |
healthwatch-aggregator |
|
|
One way | |
healthwatch-alerts |
|
|
One way | |
healthwatch-api |
|
|
One way | |
healthwatch-ingestor |
|
|
One way | |
healthwatch-worker |
|
|
One way | |
healthwatch-meta-monitor |
|
|
One way | |
opsmanager-health-check |
|
|
One way | |
ui-health-check |
|
|
One way | |
Healthwatch Forwarder VM |
|
|
One way | |
Healthwatch MySQL VM | No outbound connections | |||
Healthwatch Redis VM | No outbound connections |
Note: If you configure syslog forwarding for PCF Healthwatch, you must ensure that network path from each VM as well.
Note: PCF Healthwatch depends on the bosh-system-metrics-forwarder
component in PAS. For that to work, the trafficcontroller
VM needs to communicate to the BOSH Director on port 25595.