Sizing PCF Metrics for Your System

This topic describes how operators configure Pivotal Cloud Foundry (PCF) Metrics depending on their deployment size. Operators can use these procedures to optimize PCF Metrics for high capacity or to reduce resource usage for smaller deployment sizes.

After your deployment has been running for a while, use the information in this topic to scale your running deployment.

If you are not familiar with the PCF Metrics components, review PCF Metrics Product Architecture before reading this topic.

For how to configure resources for a running deployment, see the procedures below:

Suggested Sizing by Deployment Size

Use the following tables as a guide for configuring resources for your deployment.

Estimate the size of your deployment according to how many apps are expected to be deployed.

SizePurposeApproximate number of app instances
SmallTest use100
MediumProduction use5,000
LargeProduction use15,000

If you are using Metrics Forwarder and custom metrics, you might need to scale up the MySQL Server instance more than indicated in the tables below. Pivotal recommends you start with the one of the following configurations and scale up as necessary by following the steps in Configuring the Metrics Datastore.

Deployment Resources for a Small Deployment

Example resource configuration to store approximately 14 days of data for a small deployment, about 100 application instances:

Job Instances Persistent Disk Type VM Type
PostgreSQL Data 1 (not configurable) 1 TB xlarge (cpu: 4, ram: 16 GB, disk: 8 GB)
Redis 1 (not configurable) 5 GB micro (cpu: 1, ram: 4 GB, disk: 8 GB)
MySQL Server 1 (not configurable) 500 GB small (cpu: 1, ram: 4 GB, disk: 8 GB)

Deployment Resources for a Medium Deployment

Example resource configuration to store approximately 14 days of data for a medium deployment, about 5000 application instances:

Job Instances Persistent Disk Type VM Type
PostgreSQL Data 1 (not configurable) 12 TB 2xlarge (cpu: 8, ram: 32 GB, disk: 16 GB)
Redis 1 (not configurable) 20 GB medium (cpu: 4, ram: 16 GB, disk: 8 GB)
MySQL Server 1 (not configurable) 2 TB medium (cpu: 4, ram: 16 GB, disk: 8 GB)

Deployment Resources for a Large Deployment

Example resource configuration to store approximately 14 days of data for a large deployment, about 15,000 application instances:

Job Instances Persistent Disk Type VM Type
PostgreSQL Data 1 (not configurable) 32 TB 4xlarge (cpu: 16, ram: 64 GB, disk: 32 GB)
Redis 1 (not configurable) 60 GB 2xlarge (cpu: 8, ram: 64 GB, disk: 16 GB)
MySQL Server 1 (not configurable) 4 TB 2xlarge (cpu: 8, ram: 32 GB, disk: 16 GB)

Scale the Metrics Datastore

PCF Metrics stores metrics in a single MySQL node. For PCF deployments with high app logs load, you can add memory and persistent disk to the MySQL server node.

Considerations for Scaling the Metrics Datastore

While the default configurations in Suggested Sizing by Deployment Size above are a good starting point for your MySQL server node, they do not take into account the additional load from custom metrics. Pivotal recommends evaluating performance over a period of time and scaling upwards as necessary. As long as persistent disk is scaled up, you won’t not lose any data from scaling. Please note PCF Metrics adds a day to your configured Metrics Retention Window to prevent pruning within your desired retention window. The retention window begins at UTC±00:00 of the current day and goes back the amount of days you enter in this field, plus the one more day added by PCF Metrics.

Procedure for Scaling

Do the following to scale up the MySQL server node:

To scale up the MySQL server node, do the following:

  1. Determine how much memory and persistent disk are required for the MySQL server node.
  2. Navigate to the Ops Manager Installation Dashboard and click the Metrics tile.
  3. From the Settings tab of the Metrics tile, click Resource Config.
  4. Select the values for the Persistent Disk Type and VM Type.
  5. Click Save.

WARNING! If you are using PCF v1.9.x and earlier, there might be issues Ops Manager BOSH Director using persistent disks larger than 2 TB.

Scale the Log Datastore

PCF Metrics uses Postgres to store logs.

Considerations for Scaling

Pivotal suggests estimating your Logs data storage needs using the equation below before configuring your Postgres instance.

The following calculation attempts to measure Postgres resource requirements more precisely depending on your logs load. This formula is only an approximation, and Pivotal suggests rounding the numbers up as a safety measure against undersizing Postgres:

  1. Determine how many logs the apps in your deployment emit per hour (R). To do this, Pivotal recommends multiplying your app instances’ loggregator.doppler.ingress rate by the number of doppler instances. More information on loggregator.doppler.ingress can be found at https://docs.pivotal.io/pivotalcf/2-2/monitoring/key-cap-scaling.html#doppler-message-rate-ksi
  2. Determine the average size of each log (S). As an estimation tool, logs inserted into Postgres have been determined to be approximately 224 bytes plus roughly two times the logline length in bytes.

          224 bytes + (logline in bytes x 2) = S

          For example, for an average log line size of 975 bytes, this would be

          224 bytes + (975 bytes x 2) = 2,174 bytes
  3. As we don’t want to max out the Postgres logs data storage disk, we’re including a 20% buffer (B). Please feel free to remove this figure if you prefer.
  4. Calculate the persistent disk size for the instance (D) you need to scale to using the following formula:

          R × S × 336 × B = D
  5. The formula assumes that a log retention period is 336 hours (2 weeks), and the number of Postgres instances is 1 (not configurable). For example:

          1,000,000 logs/hr × 2,174 bytes × 336 hr × 1.2 ≈ 877 GB
  6. Please note that PCF Metrics adds one buffer day to your configured Logs Retention Window to prevent pruning within your desired retention window. The retention window begins at UTC±00:00 of the current day and goes back the amount of days you enter in this field, plus the one buffer day added by PCF Metrics.

Procedure for Scaling

WARNING! When you vertically scale your Postgres instance, Postgres enters an unhealthy period during which it does not ingest any new logs data until the scaling operation has completed.

After determining the desired size for the Postgres instance needed for your deployment, perform the following steps to scale your nodes:

  1. Navigate to the Ops Manager Installation Dashboard and click the Metrics tile.
  2. From the Settings tab of the Metrics tile, click Resource Config.
  3. Locate the PostgreSQL job and select the values for the Persistent Disk Type and VM Type.
  4. Click Save.

Scale the Temporary Datastore (Redis)

PCF Metrics uses Redis to temporarily store ingested data from the Loggregator Firehose as well as cache data queried by the Metrics API. The former use case is to prevent major metrics and logs loss when the data stores (Postgres and MySQL) are unavailable. The latter is to potentially speed up front-end queries. See PCF Metrics Product Architecture for more information.

Considerations for Scaling

The default Redis configuration specified in Suggested Sizing by Deployment Size above that fits your deployment size should work for most cases. Redis stores all data in memory, so if your deployment size requires it, you can also consider scaling up the RAM for your Redis instance.

Procedure for Scaling

Follow these steps to configure the size of the Redis VM for the temporary datastore based on your calculations.

Note: In the case that the temporary datastore becomes full, Redis uses the volatile-ttl eviction policy to continue storing incoming logs. For more information, see Eviction policies in Using Redis as an LRU cache.

  1. Navigate to the Ops Manager Installation Dashboard and click the Metrics tile.
  2. From the Settings tab, click Resource Config.
  3. Locate the Redis job and select the dropdown menus under Persistent Disk Type and VM Type to scale Redis up or down.
  4. Click Save.

Scale the Ingestor, Logqueues, Alerting, and Metrics API

The procedures for scaling the Metrics Ingestor, Logs Queue, Metrics Queue, and Metrics API instances are similar.

  • Metrics Ingestor — PCF Metrics deploys the Ingestor as an app, metrics-ingestor, within PCF. The Ingestor consumes logs and metrics from the Loggregator Firehose at the rate of loggregator.doppler.ingress, sending metrics and logs to their respective Logqueue apps.

    To customize PCF Metrics for high capacity, you can scale the number of Ingestor app instances and increase the amount of memory per instance.

  • Logs Queue — PCF Metrics deploys a Metrics Queue and an Logs Queue as apps, metrics-queue and logs-queue, within PCF. The Metrics Queue consumes metrics from the Ingestor and forwards them to MySQL. The Logs Queue consumes logs from the Ingestor and forwards them to Postgres.

    To customize PCF Metrics for high capacity, you can scale the number of queue app instances and increase the amount of memory per instance.

    The number of Metrics and Logs Queues needed is dependent on the frequency that logs and metrics are forwarded by the Ingestor. As a general rule:

    • For every 45,000 logs per minute, add 2 Logs Queues.
    • For every 17,000 metrics per minute, add 1 Metrics Queue.

    The above is a general estimate. You might need fewer instances depending on your deployment. To optimize resource allocation, provision fewer instances initially and increase instances until you achieve desired performance.

  • Metrics Alerting API — PCF Metrics deploys the app, metrics-alerting, within PCF. The Metrics Alerting app is reposnsible for creating notifications for the user-created metrics and events monitors.

    Please note that PCF Metrics 1.6 only supports one instance of this API per installation.

  • Metrics API — PCF Metrics deploys the app, metrics, within PCF.

Refer to this table to determine how many instances you need for each component.

Item Small Medium Large
Ingestor instance count Number of Doppler servers Number of Doppler servers Number of Doppler servers
Metrics Queue instance count 1 2 8
Logs Queue instance count 1 4 11
Metrics API instance count 1 2 6
Metrics Alerting API instance count 1 1 1

Find the number of Doppler servers in the Resource Config pane of the Pivotal Application Service tile.

Considerations for Scaling

Pivotal recommends starting with the configuration in Suggested Sizing by Deployment Size above, for your deployment size and then evaluating performance over a period of time and scaling upwards if performance degrades.

Procedure for Scaling

WARNING! If you decrease the number of instances, you might lose data currently being processed on the instances you eliminate.

After determining the number of instances needed for your deployment, perform the following steps to scale:

  1. Target your Cloud Controller with the Cloud Foundry Command Line Interface (cf CLI). If you have not installed the cf CLI, see Installing the cf CLI.

    $ cf api api.YOUR-SYSTEM-DOMAIN
    Setting api endpoint to api.YOUR-SYSTEM-DOMAIN...
    OK
    API endpoint:   https://api.YOUR-SYSTEM-DOMAIN (API version: 2.54.0)
    Not logged in. Use 'cf login' to log in.
    

  2. Log in with your UAA administrator credentials. To retrieve these credentials, navigate to the Pivotal Application Service tile in the Ops Manager Installation Dashboard and click Credentials. Under UAA, click Link to Credential next to Admin Credentials and record the password.

    $ cf login
    API endpoint: https://api.YOUR-SYSTEM-DOMAIN

    Email> admin Password> Authenticating... OK

  3. When prompted, target the metrics-v1-6 space.

    Targeted org system

    Select a space (or press enter to skip): 1. system 2. notifications-with-ui 3. autoscaling 4. metrics-v1-6

    Space> 4 Targeted space metrics-v1-6

    API endpoint: https://api.YOUR-SYSTEM-DOMAIN (API version: 2.54.0) User: admin Org: system Space: metrics-v1-6

  4. List the apps that are running in the metrics-v1-6 space.

    $ cf apps
    Getting apps in org system / space metrics-v1-6 as admin...
    OK
    name requested state instances memory disk urls metrics-queue-blue stopped 0/1 512M 1G metrics-blue stopped 0/1 1G 2G metrics-ui-blue stopped 0/1 256M 1G metrics-alerting-blue stopped 0/1 1G 2G metrics-ingestor-blue stopped 0/2 384M 1G logs-queue-blue stopped 0/1 256M 1G metrics-ingestor started 2/2 384M 1G metrics-queue started 1/1 512M 1G logs-queue started 1/1 256M 1G metrics-ui started 1/1 256M 1G metrics.YOUR-SYSTEM-DOMAIN metrics-alerting started 1/1 1G 2G metrics started 1/1 1G 2G metrics.YOUR-SYSTEM-DOMAIN/api/v1

  5. Scale the app to the desired number of instances:

    cf scale APP-NAME -i INSTANCE-NUMBER

    Where the APP-NAME is logs-queue, metrics, metrics-ingestor, or metrics-queue.
    For example, to scale all the apps:

    $ cf scale logs-queue -i 2
    $ cf scale metrics -i 2
    $ cf scale metrics-ingestor -i 2
    $ cf scale metrics-queue -i 2

  6. Evaluate the CPU and memory load on the instances:

    cf app APP-NAME

    For example,

    $ cf app metrics-ingestor
    Showing health and status for app metrics-ingestor in org system / space metrics as admin...
    OK
    
    requested state: started instances: 1/1 usage: 1G x 1 instances urls: last uploaded: Sat Apr 23 16:11:29 UTC 2016 stack: cflinuxfs2 buildpack: binary_buildpack

    state since cpu memory disk details #0 running 2016-07-21 03:49:58 PM 2.9% 13.5M of 1G 12.9M of 1G

  7. If your average memory usage exceeds 50% or your CPU consistently averages over 85%, add more instances with cf scale APP-NAME -i INSTANCE-NUMBER.

    In general, you should scale the app by adding additional instances. However, you can also scale the app by increasing the amount of memory per instance:

    cf scale APP-NAME -m NEW-MEMORY-LIMIT
    

    For example,

    $ cf scale metrics-ingestor -m 2G

    For more information about scaling app instances, see Scaling an Application Using cf scale.