Monitoring Pivotal Cloud Foundry

This guide describes how Pivotal Cloud Foundry (PCF) operators can monitor their deployments.

In This Guide

This guide includes the following topics:

  • Key Performance Indicators: A list of Key Performance Indicators (KPIs) that operators may want to monitor with their PCF deployment to help ensure it is in a good operational state.
  • Key Capacity Scaling Indicators: A list of capacity scaling indicators that operators may want to monitor to determine when they need to scale their PCF deployments.
  • Configuring a Monitoring System: Guidance for setting up PCF with third-party monitoring platforms to continuously monitor system metrics and trigger health alerts.

For information about logging and metrics in PCF and about monitoring of services for PCF, see Additional Resources below.

KPI Changes from PCF v1.10 to v1.11

This table highlights new and changed KPIs in PCF v1.11.

Change Which KPI and why? See…
New KPI: Scalable Syslog Adapter Loss Rate

A new component and new functionality in the Loggregator system requires additional monitoring. The loss rate of the scalable syslog adapters is derived from two new metrics: scalablesyslog.adapter.dropped and scalablesyslog.adapter.ingress.
Link
New KPI: Scalable Syslog Reverse Log Proxy Loss Rate

A new component and new functionality in the Loggregator system requires additional monitoring. The loss rate of the scalable syslog RLP is derived from two new metrics: loggregator.rlp.dropped and loggregator.rlp.ingress.
Link
New KPI: scalablesyslog.scheduler.drains

New component functionality in the Loggregator system needs monitoring.
Link
New KPI: locket.ActiveLocks

New component functionality in the Diego system needs monitoring.
Link
New KPI: locket.ActivePresences

New component functionality in the Diego system needs monitoring.
Link
Modified KPI: auctioneer.AuctioneerFetchStatesDuration

Due to improvements in the underlying architecture, recommended thresholds are now:
  • Yellow 2 s (formerly 5 s)
  • Red 5 s (formerly 10 s)
Link
Modified KPI: rep.RepBulkSyncDuration

Due to improvements in the route emitter, recommended starting thresholds on this dynamic metric are now:
  • Yellow 5 s (formerly 10 s)
  • Red 10 s (formerly 20 s)
Link
Modified KPI: route_emitter.RouteEmitterSyncDuration

Due to improvements in the underlying architecture, recommended thresholds are now:
  • Yellow 5 s (formerly 10 s)
  • Red 10 s (formerly 20 s)
Link
Modified KPI: Firehose Dropped Messages

Monitoring Firehose dropped messages is now done with DopplerServer.doppler.shedEnvelopes alone.

DopplerServer.TruncatingBuffer.totalDroppedMessages is deprecated in PCF v1.11 so no longer needs to be added to DopplerServer.doppler.shedEnvelopes to get the total number of dropped messages.
Link
Modified KPI: Firehose Loss Rate

In PCF v1.11, it is no longer necessary to add DopplerServer.doppler.shedEnvelopes with DopplerServer.TruncatingBuffer.totalDroppedMessages to calculate Firehose Dropped Messages. The formula to calculate Loss Rate has been updated accordingly.
Link
Deleted KPI: route_emitter.ConsulDownMode

This KPI was highly specific to monitoring for negative consul impacts that could result in application routes becoming unavailable. Continued improvements to the route emitter have now removed the consul dependency; there is no longer communication between route emitter and consul.
n/a
Deleted KPI: nsync_bulker.DesiredLRPSyncDuration

This metric is no longer relevant with the related architectural changes to secure Cloud Controller to Diego communication.
n/a

Additional Resources

For information about logging and metrics in PCF, see the following topics:

  • Configuring System Logging in Elastic Runtime: This topic explains how to configure the PCF Loggregator system to scale its maximum throughput and to forward logs to an external aggregator service.
  • Logging and Metrics: A guide to Loggregator, the system which aggregates and streams logs and metrics from user apps and system components in Elastic Runtime.

For information about KPIs and metrics for PCF services, see the following topics:

Create a pull request or raise an issue on the source for this page in GitHub