Monitoring PCF Metrics
Page last updated:
This topic explains how to monitor the health of the Pivotal Cloud Foundry (PCF) Metrics service using the logs, metrics, and Key Performance Indicators (KPIs) emitted by Cloud Foundry and the Metrics application itself.
For more information about monitoring PCF, see Monitoring Pivotal Cloud Foundry.
Healthwatch
The premier way to monitor PCF Metrics is using Healthwatch.
Once installed, navigate to the JobHealth dashboard to view the PCF Metrics deployment which is named apmPostgres
.
Healthwatch also supports alerting based on VM persistent
disk percentage system.disk.persistent.percent
and VM health system.healthy
.
Key Performance Indicators
KPIs for PCF Metrics are the metrics that operators find most useful for monitoring their PCF Metrics service. KPIs are high-signal-value metrics that can indicate emerging issues.
Pivotal provides the following KPIs as general alerting and response guidance for typical PCF Metrics installations. Pivotal recommends that operators continue to fine-tune the alert measures to their installation by observing historical trends. Pivotal also recommends that operators expand beyond this guidance and create new, installation-specific monitoring metrics, thresholds, and alerts based on learning from their own installations.
BOSH Metrics
All BOSH-deployed components generate the following metrics. Monitor them to ensure that they are not consuming excess resources.
system.mem.percent | ||
---|---|---|
Description | Percentage used of the VM Memory for MySQL, Redis, and PostgreSQL. Use: Too much VM Memory usage will likely negatively impact data storage and access performance. Origin:bosh-system-metrics-forwarder Type: percent Frequency: 30 s (default), 10 s (configurable minimum) | |
Recommended measurement | Average over last 10 minutes | |
Recommended alert thresholds | Yellow warning: > 80% Red critical: > 85% |
|
Recommended response | Scale up as appropriate. |
disk.persistent.percent | ||
---|---|---|
Description | Percentage used of the VM persistent disk for MySQL, Redis, and PostgreSQL. Use: It is important to make sure that the system disks of data services do not fill up and cause data loss and performance degradation. Origin:bosh-system-metrics-forwarder Type: percent Frequency: 30 s (default), 10 s (configurable minimum) | |
Recommended measurement | Average over last 10 minutes | |
Recommended alert thresholds | Yellow warning: > 70% Red critical: > 80% |
|
Recommended response | Scale up as appropriate. |
Component Metrics
All applications pushed using Cloud Foundry automatically emit the following application component metrics. PCF Metrics is a collection of applications like any other CF applications, and thus can be monitored by PCF Metrics (among other monitoring services). The following KPIs can indicate problems with your installation.
system.mem.percent | ||
---|---|---|
Description | Percentage used of the application container memory for PCF Metrics applications. Use: PCF Metrics applications running out of memory will likely negatively impact performance. Origin: rep Type: percent Frequency: Every minute | |
Recommended measurement | Average over last 10 minutes | |
Recommended alert thresholds | Yellow warning: > 80% Red critical: > 90% |
|
Recommended response | Scale up as appropriate. |
disk.persistent.percent | ||
---|---|---|
Description | Percentage used of the application container persistent disk for PCF Metrics applications. Use: PCF Metrics applications running out of disk will likely negatively impact performance. Origin: Firehose Type: percent Frequency: Every minute | |
Recommended measurement | Average over last 10 minutes | |
Recommended alert thresholds | Yellow warning: > N/A Red critical: > 80% |
|
Recommended response | Scale up as appropriate. |
Custom Metrics
Installing the Metrics Forwarder tile will allow one to gather more fine-grained metrics from the PCF Metrics deployment.
All PCF Metrics applications are already set up to emit certain custom metrics to indicate application health. As long as the Metrics Forwarder is installed, custom metrics are viewable in PCF Metrics. The following KPIs can indicate problems with your installation.
metric-processor.envelopes-stored.rate.1-minute | ||
---|---|---|
Description | The rate at which metrics processor stores metric envelopes to its persistent data store for metrics-queue. Use: Zero-value rate indicates that no metrics have been stored, which is likely caused by some major metrics processing errors or failures. Origin: metrics-forwarder Type: gauge Frequency: Every minute | |
Recommended measurement | Every minute for the past 30 minutes | |
Recommended alert thresholds | A 0 value at any point in the past 30 minutes | |
Recommended response | Consult the troubleshooting document for further guidance. |
log-processor.logstore.bulk-inserts.rate.1_minute | ||
---|---|---|
Description | The rate at which log processor stores log envelopes to its persistent data store for PCF Metrics applications. Use: Zero-value rate indicates that no logs have been stored, which is likely caused by some major logs processing errors or failures. Origin: metrics-forwarder Type: gauge Frequency: Every minute | |
Recommended measurement | Every minute for the past 30 minutes | |
Recommended alert thresholds | A 0 value at any point in the past 30 minutes | |
Recommended response | Consult the troubleshooting document for further guidance. |
If you have any further questions regarding monitoring PCF Metrics, refer to the PCF Metrics troubleshooting guide.