Monitoring PAS

This guide describes how Pivotal Cloud Foundry (PCF) operators can monitor their Pivotal Application Service (PAS) deployments. For information about monitoring Pivotal Container Service (PKS) deployments, see Logging and Monitoring PKS.

For more information about logging and metrics in PCF, see Overview of Logging and Metrics.

Overview

This guide includes the following topics:

  • Key Performance Indicators: A list of Key Performance Indicators (KPIs) that operators may want to monitor with their PAS deployment to help ensure it is in a good operational state.
  • Key Capacity Scaling Indicators: A list of capacity scaling indicators that operators may want to monitor to determine when they need to scale their PAS deployments.
  • Selecting and Configuring a Monitoring System: Guidance for setting up PAS with monitoring platforms to continuously monitor component metrics and trigger health alerts.

KPI Changes from PAS v2.0 to v2.1

This table highlights new and changed KPIs in PAS v2.1.

Modified KPI: rep.UnhealthyCell

Added a recommended warning threshold of = 1 to prompt further investigation. Previously, only a critical threshold was recommended.
Link
Modified KPI: Number of Route Registration Messages Sent and Received

The route_emitter.MessagesEmitted metric has been deprecated in favor of route_emitter.HTTPRouteNATSMessagesEmitted. The suggested formula for this assessment has been updated.
Link
Modified KPI: Log Transport Loss Rate (formerly Firehose Loss Rate)

In PAS v2.1, it is no longer necessary to include DopplerServer.doppler.shedEnvelopes or DopplerServer.listeners.totalReceivedMessageCount in the calculated values, as these metrics are now deprecated. The formula to calculate Loss Rate has been updated accordingly. Additionally, this measure of Loss Rate has been renamed to better distinguish it from future measures regarding Log Ingress.

Due to improvements in the underlying architecture, recommended thresholds are now:
  • Warning/Yellow ≥ 0.005
  • Critical/Red ≥ 0.001
Link
Modified KPI: Scalable Syslog Drain Bindings Count

As of May 2018, the recommended scaling threshold for this metric was modified downward.
Link
Modified KPI: Cloud Controller and Diego in Sync

As of May 2018, the recommended alerting threshold for this metric was modified.
Link
New KPI: Doppler Message Rate Capacity

Indicates the need to scale based on the recommended maximum load for Doppler instances.
Link
Deleted KPI: Firehose Dropped Messages

With the addition of new Firehose metrics, this metric is no longer recommended as a high-value monitoring indicator.
n/a
Deleted KPI: Firehose Throughput

With the addition of new Firehose metrics, this metric is no longer recommended as a high-value monitoring indicator. Log Transport Loss Rate and Doppler Message Rate Capacity are the recommended indicators for scaling needs.
n/a

Monitor PCF Services

For information about KPIs and metrics for PCF services, see the following topics:

Create a pull request or raise an issue on the source for this page in GitHub