LATEST VERSION: 1.3 - CHANGELOG
PCF Metrics v1.3

PCF Metrics Product Architecture

This topic describes the product architecture of Pivotal Cloud Foundry (PCF) Metrics.

Overview

The diagram below displays the components of PCF Metrics in bold, as well as the Cloud Foundry components that the PCF Metrics system interacts with.

PCF Metrics deploys several Cloud Foundry apps as part of the install process. These components are the bold rectangles in the diagram. The cylinders represent the data storage components of PCF Metrics.

Architecture

See the following sections to understand the several processes that happen within the PCF Metrics system.

How Data Flows from the Firehose to the Datastores

This section describes how PCF Metrics fills its datastores. PCF Metrics uses two datastores:

  • The MySQL component stores metric and event data from the apps running on your PCF deployment.
    • Examples of events are start and stop.
    • Examples of metrics are container metrics such as CPU and network metrics such as Requests.
  • The Elasticsearch component stores logs from the apps running on your PCF deployment.

Components

The diagram below highlights the components involved in the process of getting metric and log data into the Elasticsearch and MySQL datastore.

process one

Process

The following table describes how the components act during each stage.

Stage Description
1 The metrics-ingestor app does the following:
  • Receives app logs from the Firehose and forwards them to both the elasticsearch-logqueue and mysql-logqueue apps
  • Receives container metrics and network metrics (HTTPStartStop events) from the Firehose and forwards them to the mysql-logqueue app
2 Each of the logqueues act independently, writing information to the datastores:

Elasticsearch logqueue

The elasticsearch-logqueue app checks whether the Elasticsearch datastore is available and does the following:
  • If Elasticsearch is available: Buffers logs and writes them to the Elasticsearch datastore
  • If Elasticsearch is NOT available:
    1. Buffers logs and writes them to the Temporary datastore until Elasticsearch becomes available
    2. After the Elasticsearch becomes available:
      • Retrieves logs from the Temporary datastore in groups of 1000, buffers them, and writes them to the Elasticsearch datastore
      • Continues buffering logs from the ingestor and writing them to the Elasticsearch datastore


MySQL logqueue

The mysql-logqueue app buffers logs and writes each data type to MySQL as follows:

  • Container metrics: Inserts messages into the container_metric table of MySQL
  • Network: Inserts messages into the http_start_stop table of MySQL
  • App logs: Parses log messages for an app event name and inserts the message into the app_event table of MySQL
3 The metrics-aggregator app, which runs according to an AGGREGATE_FREQUENCY property, does the following to aggregate the data stored in MySQL:

  1. Retrieves container and network metrics from MySQL
  2. Aggregates the data for each app over the last four minutes, grouped by one minute intervals
  3. Inserts the aggregated data into the app_metric_rollup table of the MySQL component

How the PCF Metrics UI Retrieves Data from the Datastores

This section describes the flow of data through the system when you interact with the PCF Metrics UI.

Components

The diagram below highlights the components involved in this process.

Process 2

Process

The following table describes how the components act during each stage.

StageDescription
1 A user launches metrics.SYSTEM-DOMAIN in a browser and enters her UAA credentials.
2 After the UAA authorizes the user, the browser does the following:

  1. Retrieves through the Cloud Controller API a list of apps that the user can access
  2. Displays a page in which the user can select any app returned by the Cloud Controller API
3 A user selects an app from the dropdown menu, which does the following:

  1. Opens a Server-Sent Events (SSE) connection to the metrics app (metrics API)
  2. Sends HTTP Put requests to the metrics API to retrieve metrics and logs for the specified time frame
4 The metrics API receives the requests from the browser and does the following:

  1. Communicates with the UAA and Cloud Controller to confirm that the user can access data for the requested app
  2. Creates jobs on Redis channels that describe the type of metric, log, or event requested, as well as the time period
  3. Note: PCF Metrics uses Redis as a pub-sub mechanism between the metrics API and worker apps to marshal metrics and logs.

5 The worker-app-dev and worker-app-logs apps, which subscribe to the job channels on Redis, recognize the jobs created by the metrics API. The apps remove their corresponding jobs and do the following:

  1. Retrieve data from the datastores:
    1. worker-app-dev queries MySQL to retrieve any metrics and events requested for the time period.
    2. worker-app-logs queries Elasticsearch to retrieve the logs for the time period requested.
  2. Publish the data to Redis
6 Redis forwards the data to the metrics API.
7 The metrics API streams the data to the browser over SSE, and the PCF Metrics UI displays the data requested by the user.

How Worker Apps Monitor the System

The following table describes the two worker components that PCF Metrics uses to monitor other components in the system.

Worker Component Function
worker-health-check The health-check worker is an app that does the following every minute:

  • Checks whether the apps deployed by PCF Metrics can reach the MySQL, Elasticsearch, and Redis datastores
  • Records the number of MySQL connections and Redis channels
worker-reaper The reaper worker is an app that removes orphaned connections from the worker-app-dev and worker-app-logs apps to Redis.

PCF Metrics requires the reaper worker because Redis does not remove its connections to worker-app-dev and worker-app-logs if they restart.
Create a pull request or raise an issue on the source for this page in GitHub