Loggregator Guide for PAS Operators

Page last updated:

This topic contains information for Pivotal Application Service (PAS) deployment operators about how to configure the Loggregator system to avoid data loss with high volumes of logging and metrics data.

For more information about the Loggregator system, see Loggregator Architecture.

Loggregator Message Throughput and Reliability

You can measure both the message throughput and message reliability rates of your Loggregator system.

Measuring Message Throughput

To measure the message throughput of the Loggregator system, you can monitor the total number of egress messages from all Metrons on your platform using the metron.egress metric.

If you do not use a monitoring platform, you can measure the overall message throughput of your Loggregator system. To measure the overall message throughput:

  1. Log in to the Cloud Foundry Command Line Interface (cf CLI) with your admin credentials by running:

    cf login
    
  2. Install the PAS Firehose plugin. For more information, see the Install the Plugin section of the Installing the Loggregator Firehose Plugin for cf CLI topic.

  3. Install Pipe Viewer by running:

    apt-get install pv
    
  4. Run:

    cf nozzle -n | pv -l -i 10 -r > /dev/null
    

Measuring Message Reliability

To measure the message reliability rate of your Loggregator system, you can run black-box tests. If you want to use this method, see the open-source cf-logmon app and the configuration instructions provided in the README.md file.

Scaling Loggregator

Most Loggregator configurations use preferred resource defaults. For more information about customizing these defaults and planning the capacity of your Loggregator system, see Key Capacity Scaling Indicators.

Scaling Nozzles

Nozzles are programs that consume data from the Loggregator Firehose. Nozzles can be configured to select, buffer, and transform data, and to forward it to other apps and services. You can scale a nozzle using the subscription ID specified when the nozzle connects to the Firehose. If you use the same subscription ID on each nozzle instance, the Firehose evenly distributes data across all instances of the nozzle.

For example, if you have two nozzle instances with the same subscription ID, the Firehose sends half of the data to one nozzle instance and half to the other. Similarly, if you have three nozzle instances with the same subscription ID, the Firehose sends one-third of the data to each instance.

If you want to scale a nozzle, the number of nozzle instances should match the number of Traffic Controller instances:

Number of nozzle instances = Number of Traffic Controller instances

Stateless nozzles should handle scaling gracefully. If a nozzle buffers or caches the data, the nozzle author must test the results of scaling the number of nozzle instances up or down.

Note: You can disable the Firehose. In place of the Firehose, you can configure an aggregate log and metric drain for your foundation.

Slow Nozzle Alerts

The Traffic Controller alerts nozzles if they consume events too slowly. The following metrics can be used to identify slow consumers:

  • For v1 consumers: doppler_proxy.slow_consumer is incremented as consumers are disconnected for being slow.

  • For v2 consumers: doppler.dropped{direction="egress"} or rlp.dropped{direction="egress"} are incremented when a v2 consumer fails to keep up.

For more information about the Traffic Controller, see the Loggregator Architecture section of the Loggregator Architecture topic.

Forwarding Logs to an External Service

You can configure PAS to forward log data from apps to an external aggregator service. For information about how to bind apps to the external service and configure it to receive logs from PAS, see Streaming App Logs to Log Management Services.

Log Message Size Constraints

When a Diego Cell emits app logs to Metron, Diego breaks up log messages greater than approximately 60 KiB into multiple envelopes.