LATEST VERSION: 1.4 - CHANGELOG
PCF Metrics v1.4

Monitoring and Troubleshooting Apps with PCF Metrics

This topic describes how developers can monitor and troubleshoot their apps using Pivotal Cloud Foundry (PCF) Metrics.

Overview

PCF Metrics helps you understand and troubleshoot the health and performance of your apps by displaying the following:

  • Container Metrics: Three graphs measuring CPU, memory, and disk usage percentages
  • Network Metrics: Three graphs measuring requests, HTTP errors, and response times
  • Custom Metrics: User-customizable graphs for measuring app performance, such as Spring Boot Actuator metrics
  • App Events: A graph of update, start, stop, crash, SSH, and staging failure events
  • Logs: A list of app logs that you can search, filter, and download
  • Trace Explorer: A dependency graph that traces a request as it flows through your apps and their endpoints, along with the corresponding logs

The following sections describe a standard workflow for using PCF Metrics to monitor or troubleshoot your apps.

View an App

In a browser, navigate to metrics.YOUR-SYSTEM-DOMAIN and log in with your User Account and Authentication (UAA) credentials. Choose an app for which you want to view metrics or logs. PCF Metrics respects UAA permissions such that you can view any app that runs in a space that you have access to.

Search for an app

PCF Metrics displays app data for a given time frame. See the sections below to Change the Time Frame for the dashboard, Interpret Metrics information on each graph, and Trace App Requests with the Trace Explorer.

Metrics UI

Change the Time Frame

The graphs show time along the horizontal axis. You can change the time frame for all graphs and the logs by using the time selector at the top of the window. Adjust either end of the selector or click and drag.

time

Zoom: From within any graph, click and drag to zoom in on areas of interest. This adjusts all of the graphs, and the logs, to show data from that time frame.

Metric zoom

Add, Edit, and Delete Charts

The PCF Metrics dashboard allows users to add, edit, and delete charts.

Add Chart: To add a new chart, follow the steps below.

  1. Click + ADD CHART at the top right of the dashboard.

    Metrics add1

  2. In the modal window, either select a metric from the dropdown menu or type the name of the metric into the search bar to filter results.

    Metrics add2

  3. Select an aggregation type. This determines how to combine the data from multiple instances.

    Metrics add3

Edit Chart: To change how instances are aggregated for an existing metric, click the pencil icon on the header of the metric chart. When the Edit Chart modal window appears, you can choose the aggregation type and click Save to apply changes.

Metrics edit

Delete Chart: To delete an existing chart on the dashboard, click the trash can icon on the header of the metric chart and then click Delete.

Metrics delete

View and Reorder Metric Charts

Reorder: Each metric has its own chart. You can click and drag the chart header to change the ordering of charts.

reorder

Expand: To see more details in complex graphs, you can expand a chart by clicking the icon in the chart header.

Metrics expand1

You can collapse the chart by clicking the icon again.

Metrics expand2

View Metrics at App-Instance Level

PCF Metrics relays metric data at the app-instance level to allow for an in-depth troubleshooting experience. Users are able to view the app metrics related to a specific instance index, which correlates directly with the app instance indices shown in Apps Manager.

To view metrics at the app-instance level, turn the view instances toggle on.

instances

To select or deselect a specific app instance, select the desired instance from the instance filter dropdown menu.

Metrics instances2

Alternatively, click an instance line on the metric chart that interests you to select the instance.

Metrics instances3

Interpret Metrics

See the following sections to understand how to use each of the views on the dashboard to monitor and troubleshoot your app.

Container Metrics

Three Container Metrics charts are available on the PCF Metrics dashboard:

  • CPU usage percentage: cf.system.cpu Cpu

    A spike in CPU might point to a process that is computationally heavy. Scaling app instances can relieve the immediate pressure, but you need to investigate the app to better understand and fix the root cause.

  • Memory usage percentage: cf.system.memory Memory

    A spike in memory might mean a resource leak in the code. Scaling app memory can relieve the immediate pressure, but you need to find and resolve the underlying issue so that it does not occur again.

  • Disk usage percentage: cf.system.disk Disk

    A spike in disk might mean the app is writing logs to files instead of STDOUT, caching data to local disk, or serializing large sessions to disk.

Network Metrics

Three Network Metrics charts are available on the PCF Metrics dashboard:

  • Number of network requests per minute: cf.system.request-count Request count

    A spike in HTTP requests means more users are using your app. Scaling app instances can reduce the response time.

  • Number of network request errors per minute: cf.system.request-error-count Request error count

    A spike in HTTP errors means one or more 5xx errors have occurred. Check your app logs for more information.

  • Average latency of a request in milliseconds: cf.system.latency Latency

    A spike in response time means your users are waiting longer. Scaling app instances can spread that workload over more resources and result in faster response times.

Events

The Events graph shows the following app events: Crash, Fail (staging failures), Update, Stop, Start, and SSH.

Events

Note: The SSH event corresponds to someone successfully using SSH to access a container that runs an instance of the app.

See the following topics for more information about app events:

Custom Metrics

Users can configure their apps to emit custom metrics out of the Loggregator Firehose and then view these metrics on the PCF Metrics dashboard. For steps on how to set up your apps to emit custom metrics, refer to the Metrics Forwarder Documentation. If you have configured the apps correctly, you should be able to automatically see custom metrics on the PCF Metrics dashboard when you add a chart.

Metrics custom

In addition, Spring Boot apps with actuators implemented emit Spring Boot Actuator metrics out of the box, without any changes to source code. In PCF Metrics, these metrics look similar to the following:

Metrics spring

Logs

The Logs view displays app log data ingested from the Loggregator Firehose, including a histogram that shows log frequency for the current time frame:

Logs

The green time needle visible on metrics charts and the logs histogram indicates the beginning of the logs. Depending on the sort order of your logs, you can see different results:

  • Sort by newest first (default): Newest first

    The logs drawer retrieves all logs in the selected time frame that is older than/to the left of the needle. The log outlined in green is the newest log among the logs located to the left of the needle placement.

  • Sort by oldest first: Oldest first

    The logs drawer retrieves all logs in the selected time frame that is newer than/to the right of the needle. The log outlined in green is the oldest log among the logs located to the right of the needle placement.

To adjust the placement of the time needle, click the handle at the bottom of the needle and drag to reposition it. Alternatively, you can click anywhere along the x-axis of a metric chart or the logs histogram to snap the needle to that position.

You can interact with the Logs view in the following ways:

  • Keyword: Perform a keyword search. The histogram updates with blue bars based on what you enter. Hover over a histogram bar to view the number of logs for a specific time.
  • Highlight: Enter a term to highlight within your search. The histogram updates with yellow bars based on the results. Hover over a histogram bar to view the number of logs for a specific time that contain the highlighted term.
  • Sources: Choose which sources to display logs from. For more information, see Log Types and Their Messages.
  • Order: Modify the order in which logs appear.
  • Download: Download a file containing logs for the current search.
  • Copy: Click the copy icon to copy the text of the log.
  • View in Trace Explorer: Open a window to see the trace of the request associated with the log. See Trace App Requests.

Trace App Requests

A request to one of your apps initiates a workflow within the app or system of apps. The record of this workflow is a trace, which you can use to troubleshoot app failures and latency issues. In the Trace Explorer view, PCF Metrics displays an interactive graph of a trace and its corresponding logs. See the sections below to understand how to use the Trace Explorer.

For more information about traces, see What is a Trace? in the Open Tracing documentation.

Prerequisites

PCF Metrics constructs the Trace Explorer view using trace IDs shared across app logs. Before you use the Trace Explorer, examine the following list to ensure PCF metrics can extract the necessary data from your app logs for your specific app type.

  • Spring: Follow the steps below.
    1. Ensure you are using Spring Boot v1.4.3 or later.
    2. Ensure you are using Spring Cloud Sleuth v1.0.12 or later.
    3. Add the following to your app dependency file:
      dependencies { (2)
      compile "org.springframework.cloud:spring-cloud-starter-sleuth"
      }
  • Node.js, Go, and Python: Ensure that the servers associated with your app do not modify HTTP requests in a way that removes the X-B3-TraceId, X-B3-SpanId, and X-B3-ParentSpan headers from a request. You also need to add Trace ID, Span ID, and Parent Span ID to the SLF4J MDC in your app logs to correlate logs within the Trace Explorer.
  • Ruby: Ruby servers that use a library depending on Rack modify HTTP request headers in a way that is incompatible with PCF Metrics. If you want to trace app requests for your Ruby apps, ensure that your framework does not rely on Rack. You may need to write a raw Ruby server that preserves the X-B3-TraceId, X-B3-SpanId, and X-B3-ParentSpan headers in the request. You also need to add Trace ID, Span ID, and Parent Span ID to the SLF4J MDC in your app logs to correlate logs within the Trace Explorer.

Use the Trace Explorer

This section explains how to view the trace for a request received by your app and interact with the Trace Explorer.

  1. Select an app on the PCF Metrics dashboard.

  2. Click the Trace Explorer icon in a log for which you want to trace the request.

    Hover over trace icon

    • The Trace Explorer displays the apps and endpoints involved in completing a request, along with the corresponding logs: Trace Explorer A request corresponds to a single trace ID displayed in the top left corner. Each row includes an app in the left column and a span in the right column. A span is a particular endpoint within the app and the time it took to execute in milliseconds. By default, the graph lists each app and endpoint in the order they were called.

      Note: If you do not have access to the space for an app involved in the request, you cannot see the spans or logs from that app.

    • You can click a span to show only logs from that span or any number of spans to toggle which logs appear. Clicking a span also creates a box with that particular span ID in the Logs view: Click Span
    • If you click APP APP-NAME within a log, PCF Metrics returns you to the dashboard view for that app, with the time frame focused on the time of the log that you clicked from.
Create a pull request or raise an issue on the source for this page in GitHub