Monitoring Pivotal Cloud Foundry

This guide describes how Pivotal Cloud Foundry (PCF) operators can monitor their deployments.

In This Guide

This guide includes the following topics:

  • Key Performance Indicators: A list of Key Performance Indicators (KPIs) that operators may want to monitor with their PCF deployment to help ensure it is in a good operational state.
  • Key Capacity Scaling Indicators: A list of capacity scaling indicators that operators may want to monitor to determine when they need to scale their PCF deployments.
  • Configuring a Monitoring System: Guidance for setting up PCF with third-party monitoring platforms to continuously monitor component metrics and trigger health alerts.

For information about logging and metrics in PCF and about monitoring of services for PCF, see Additional Resources below.

KPI Changes from PCF v2.0 to v2.1

This table highlights new and changed KPIs in PCF v2.1.

Modified KPI: rep.UnhealthyCell

Added a recommended warning threshold of = 1 to prompt further investigation. Previously, only a critical threshold was recommended.
Link
Modified KPI: Number of Route Registration Messages Sent and Received

The route_emitter.MessagesEmitted metric has been deprecated in favor of route_emitter.HTTPRouteNATSMessagesEmitted. The suggested formula for this assessment has been updated.
Link
Modified KPI: Log Transport Loss Rate (formerly Firehose Loss Rate)

In PCF v2.1, it is no longer necessary to include DopplerServer.doppler.shedEnvelopes or DopplerServer.listeners.totalReceivedMessageCount in the calculated values, as these metrics are now deprecated. The formula to calculate Loss Rate has been updated accordingly. Additionally, this measure of Loss Rate has been renamed to better distinguish it from future measures regarding Log Ingress.

Due to improvements in the underlying architecture, recommended thresholds are now:
  • Warning/Yellow ≥ 0.005
  • Critical/Red ≥ 0.001
Link
New KPI: Doppler Message Rate Capacity

Indicates the need to scale based on the recommended maximum load for Doppler instances.
Link
Deleted KPI: Firehose Dropped Messages

With the addition of new Firehose metrics, this metric is no longer recommended as a high-value monitoring indicator.
n/a
Deleted KPI: Firehose Throughput

With the addition of new Firehose metrics, this metric is no longer recommended as a high-value monitoring indicator. Log Transport Loss Rate and Doppler Message Rate Capacity are the recommended indicators for scaling needs.
n/a

Additional Resources

For information about logging and metrics in PCF, see the following topics:

  • Configuring System Logging in PAS: This topic explains how to configure the PCF Loggregator system to scale its maximum throughput and to forward logs to an external aggregator service.
  • Logging and Metrics: A guide to Loggregator, the system which aggregates and streams logs and metrics from user apps and system components in PAS.

For information about KPIs and metrics for PCF services, see the following topics:

Create a pull request or raise an issue on the source for this page in GitHub