App Log Rate Limiting
- Overview of App Log Rate Limiting
- Determining the Ideal App Log Rate Limit
- What Happens When App Instances Exceed the App Log Rate Limit
- How Diego Cells Determine When an App Instance Has Exceeded the App Log Rate Limit
- Configure an Alert for the AppInstanceExceededLogRateLimitCount Metric
- Identify Apps That Exceed the App Log Rate Limit
Page last updated:
This topic describes app log rate limiting for apps in VMware Tanzu Application Service for VMs (TAS for VMs).
Overview of App Log Rate Limiting
In TAS for VMs, you can limit the number of log lines each app instance can generate per second by configuring the App log rate limit (beta) section in the App Containers pane of the TAS for VMs tile.
App log rate limiting is disabled by default. VMware recomends enabling this feature to prevent app instances from overloading the Loggregator Agent with logs, so the Loggregator Agent does not drop logs for other app instances on the same Diego Cell. Enabling this feature can also prevent apps from reporting inaccurate app metrics in the Cloud Foundry Command Line Interface (cf CLI) which can happen if Log Cache evicts metrics from the cache in order to store large volumes of logs. It also can limit the CPU usage of logging agents on the Diego Cell VM.
To configure app log rate limits, see Configure App Containers in Configuring TAS for VMs.
Note: In TAS for VMs, this rate limit is applied globally across all your apps. If only some of your apps require this setting, see App Containers in Installing Isolation Segment.
Determining the Ideal App Log Rate Limit
The ideal app logging rate for a deployment depends on characteristics such as VM sizes and
the number and type of apps in TAS for VMs. VMware recommends
using at minimum the default limit of 100
as otherwise logs can be substantially delayed given the buffer size.
When you enable app log rate limiting, Diego applies the rate limit to each app instance. For example, if there are five instances of an app running, Diego does not sum the logging rates of all five instances when determining if the rate limit has been exceeded. Instead, Diego evaluates the logging rate of each individual app instance and only limits instances that exceed the rate limit.
What Happens When App Instances Exceed the App Log Rate Limit
When an app instance exceeds the configured rate limit, Diego stores the app logs in a buffer and releases them into the logging stream at the per-second rate you configure through the App log rate limit (beta) section in the App Containers pane of the TAS for VMs tile. This buffer holds approximately 5Mb to 10Mb of logs. If an app’s logs exceed the size of the buffer the log lines will be dropped before being forwarded off the Diego Cell. While there are app logs in the buffer a message indiciating the app is exceeding the rate limit will appear in the app log stream once per second.
For more information about how Diego rate limits app logs, see package rate in the Go documentation.
How Diego Cells Determine When an App Instance Has Exceeded the App Log Rate Limit
The Diego Cell containing the app instance emits the AppInstanceExceededLogRateLimitCount
counter metric when it exceeds the rate limit, similar to the following example:
origin:"rep" eventType:CounterEvent timestamp:1582582740243576212 deployment:"cf" job:"diego-cell" index:"0e98fd00-47b2-4589-94f0-385f78b3a04d" ip:"10.0.1.12" tags:<key:"instance_id" value:"0e98fd00-47b2-4589-94f0-385f78b3a04d" > tags:<key:"source_id" value:"rep" > counterEvent:<name:"AppInstanceExceededLogRateLimitCount" delta:1 total:206 >
Each Diego Cell in a deployment has a unique AppInstanceExceededLogRateLimitCount
counter.
The total
value of the counter is the sum total of all app instances on that Diego Cell that
have exceeded the rate limit since the creation of the Diego Cell. When there are no app instances
exceeding the rate limit, Diego Cells do not emit the AppInstanceExceededLogRateLimitCount
metric.
For example, app-instanceA
and app-instanceB
are running on one Diego Cell, app-instanceC
and app-instanceD
are running on a second Diego Cell, and the current total
for the
AppInstanceExceededLogRateLimitCount
is 125
on the first Diego Cell and 43
on the second
Diego Cell. If app-instanceD
exceeds the rate limit, the second Diego Cell emits the
AppInstanceExceededLogRateLimitCount
metric with a incremented total
value of 44
. However,
the first Diego Cell does not emit the AppInstanceExceededLogRateLimitCount
metric, and the
total
value for the AppInstanceExceededLogRateLimitCount
metric on the first Diego Cell
is still 125
.
A Diego Cell emits the AppInstanceExceededLogRateLimitCount
metric conditionally when an
app instance on that Diego Cell begins to exceed the rate limit. For example, app-instanceC
and app-instanceD
are on the same Diego Cell. If app-instanceC
exceeds the rate limit
continually over a ten-minute period, and app-instanceD
exceeds the rate limit during the
first three minutes of each five-minute interval within that ten-minute period and then stops,
the Diego Cell emits the AppInstanceExceededLogRateLimitCount
metric three times within that
ten-minute period.
Configure an Alert for the AppInstanceExceededLogRateLimitCount Metric
If you are using a third-party log management service, you can configure an alert for when
the aggregated sum of the AppInstanceExceededLogRateLimitCount
metric across all the Diego
Cells on TAS for VMs has been incremented more than a certain number of times
or over a certain percentage in the last five or more minutes. When you configure this alert,
consider the number of app instances running on TAS for VMs, the logging rate
that you configured in TAS for VMs, your other TAS for VMs
configuration settings, and so on.
For more information about third-party log management services, see Streaming App Logs to Log Management Services.
Identify Apps That Exceed the App Log Rate Limit
Diego also logs when a noisy app instance exceeds the rate limit set in TAS for VMs. A log message similar to the example below appears in the log stream for the noisy app:
2020-02-24T12:42:18.90-0800 [APP/PROC/WEB/0] OUT app instance exceeded log rate limit (100 log-lines/sec) set by platform operator
To identify which app instances are exceeding the app log rate limit:
Note: The Firehose and Log Cache plugins were developed by the open-source Cloud Foundry community and are not supported by VMware.
Install the Firehose plugin by running:
cf install-plugin 'Firehose Plugin'
Install the Log Cache plugin by running:
cf install-plugin 'log-cache'
Filter your app log messages by running:
cf nozzle -f LogMessage | grep "app instance exceeded log rate limit"
The command returns all logs with log messages containing
"app instance exceeded log rate limit"
, similar to the following example:origin:"rep" eventType:LogMessage timestamp:1583859621886751670 deployment:"warp-drive" job:"diego-cell" index:"3a574bde-91df-48b8-ae21-1d6913da0908" ip:"10.0.1.33" tags:<key:"app_id" value:"34bcfafc-402b-4bb4-84db-aea5401b79eb" > tags:<key:"app_name" value:"app-2" > tags:<key:"instance_id" value:"0" > tags:<key:"organization_id" value:"a30f39c2-4ff3-48a1-a869-a9ed21812a61" > tags:<key:"organization_name" value:"test" > tags:<key:"process_id" value:"34bcfafc-402b-4bb4-84db-aea5401b79eb" > tags:<key:"process_instance_id" value:"92e2ee78-3a1d-41a6-4933-e47b" > tags:<key:"process_type" value:"web" > tags:<key:"source_id" value:"34bcfafc-402b-4bb4-84db-aea5401b79eb" > tags:<key:"space_id" value:"0e2d2d58-3ef5-43f3-b880-c8a30903a96b" > tags:<key:"space_name" value:"test-2" > logMessage:<message:"app instance exceeded log rate limit (100 log-lines/sec) set by platform operator" message_type:OUT timestamp:1583859621886751670 app_id:"34bcfafc-402b-4bb4-84db-aea5401b79eb" source_type:"APP/PROC/WEB" source_instance:"0" >
You can inspect these logs to identify the app instances that are exceeding the app log rate limit.