Using Debug Logs for App Autoscaler

Page last updated:

This topic describes how to monitor App Autoscaler and its supporting VMware Tanzu Application Service for VMs (TAS for VMs) components through debug logging.

Overview of Debug Logging in Autoscaler

Autoscaler does not record debug logs by default. When you configure Autoscaler to record debug logs, the list of logs that Autoscaler records includes debug logs along with its normal logs. Debug logs include DEBUG in their log lines.

You can use the debug logs that Autoscaler generates to do the following:

  • See whether developers of your apps are using Autoscaler to scale them, as well as measure the efficacy of apps that use or do not use Autoscaler. For more information, see Reviewing Autoscaler Usage below.

  • Determine if TAS for VMs components are failing to sufficiently support Autoscaler. For more information, see Failure Modes below.

To configure Autoscaler to record debug logs, see Configure Debug Logging for Autoscaler below.

To review the debug logs for Autoscaler, see Review Autoscaler Debug Logs below.

Configure Debug Logging for Autoscaler

To configure Autoscaler to record debug logs:

  1. In a terminal window, run:

    cf target -o system -s autoscaling
    
  2. Set the logging level for Autoscaler to debug by running:

    cf set-env autoscale LOG_LEVEL debug
    
  3. Restart Autoscaler by running:

    cf restart autoscale
    

    Note: Re-starting Autoscaler briefly interrupts any processes it is currently performing.

Review Autoscaler Debug Logs

To review the debug logs for Autoscaler:

  1. In a terminal window, run:

    cf target -o system -s autoscaling
    
  2. Run:

    cf logs autoscale
    

    In the terminal output, all logs for Autoscaler appear, including debug logs. Debug logs include DEBUG in their log lines.

Reviewing Autoscaler Usage

As an operator, you can use debug logs to see whether developers of your apps are using Autoscaler to scale them, as well as measure the efficacy of apps that use or do not use Autoscaler.

If you have configured Autoscaler to record debug logs, Autoscaler logs the following metrics:

  • autoscale.scale_up: the number of times Autoscaler has scaled an app up

  • autoscale.scale_down: the number of times Autoscaler has scaled an app down

  • autoscale.enabled_bindings: the number of Autoscaler service bindings that have been created

Note: The metrics that are listed above appear as log lines and are not emitted to Loggregator.

To view these metrics, see Review Autoscaler Debug Logs above.

You can also use one of the following methods to identify which apps are using Autoscaler:

  • You can review the overall number of running app instances that Diego reports for each Diego Cell to measure how many app instances are running at any given time. For more information, see Running App Instances, Rate of Change in Key Performance Indicators.

  • You can use cf curl to retrieve information about Autoscaler service instances from the Cloud Foundry API (CAPI). To identify apps that are using Autoscaler through CAPI, see Retrieve Autoscaler Service Instances from CAPI Endpoints below.

Because each team of developers may use different autoscaling rules for their apps, it may be difficult to accurately determine how effectively Autoscaler scales each app. However, you can gain a generalized understanding of the effect that Autoscaler has on the apps it scales by comparing the number of running app instances with other scaling metrics that the developers of your apps may be using. For more information about various scaling metrics you can use in your comparison review, see Key Performance Indicators.

Retrieve Autoscaler Service Instances from CAPI Endpoints

To identify apps that are using Autoscaler through CAPI:

  1. In a terminal window, retrieve the global unique identifier (GUID) of the service plan for Autoscaler by running:

    cf curl /v3/service_plans/:guid
    

    In the output for the above command, the value of guid is the GUID of the service plan for Autoscaler. For more information, see the CAPI documentation.

  2. Record the value of guid from the output of the command you ran in the previous step.

  3. Retrieve information about apps that are using Autoscaler using one of the following methods:

    • Retrieve a list of the names of Autoscaler service instances by running:

      cf curl /v3/service_plans?service_offering_names=app-autoscaler | jq -r '.resources[] | .guid' SERVICE-PLAN-GUID
      

      Where SERVICE-PLAN-GUID is the GUID of the service plan for Autoscaler that you recorded in the previous step. For more information, see the CAPI documentation.

      Note: To run the above command, you must have jq installed. To install jq, see the jq website.

    • Retrieve a list of the GUIDs of Autoscaler service instances by running:

      cf curl '/v3/service_instances?service_plan_guids=SERVICE-PLAN-GUID'
      

      Where SERVICE-PLAN-GUID is the GUID of the service plan for Autoscaler that you recorded in a previous step. For more information, see the CAPI documentation.

Failure Modes

Instead of always maintaining enough app instances to handle the maximum amount of traffic that you expect for your apps, Autoscaler allows you to provision only the number of instances that your apps require to handle the amount of traffic that you expect at any given time. However, if Autoscaler or the TAS for VMs components that support Autoscaler fail, Autoscaler cannot scale the number of app instances up enough to meet the needs of your apps.

If you are not using Autoscaler, TAS for VMs component failures might have little impact on your apps. Because Autoscaler depends on TAS for VMs components such as Log Cache, the Cloud Controller, and MySQL to scale apps, using Autoscaler greatly increases the impact of TAS for VMs component failures on those apps.

This section describes some of the ways in which the TAS for VMs components that support Autoscaler can fail, as well as examples of the debug log metrics, autoscaling events, and errors in Autoscaler debug logs that may appear when these TAS for VMs components fail.

For more information, see the sections below:

Log Cache

Autoscaler depends heavily on Log Cache, which provides the majority of metrics that Autoscaler uses to determine when to scale an app. If Log Cache fails or is insufficiently scaled, Autoscaler cannot retrieve metrics, so it cannot determine when to scale an app.

If you have configured Autoscaler to record debug logs, Autoscaler logs the following metrics:

  • autoscale.logcache.read_errors: errors that Autoscaler encounters when reading from Log Cache

  • autoscale.logcache_metric_fetch_duration: the time Autoscaler takes to retrieve metrics from Log Cache

Note: The metrics that are listed above appear as log lines and are not emitted to Loggregator.

To view these metrics, see Review Autoscaler Debug Logs above.

For more information about Log Cache, see Logging and Metrics Architecture and Log Components in Log and Metric Agent Architecture (Beta).

Log Cache Gateway is Unavailable

When Autoscaler cannot connect to the Log Cache gateway, you see an autoscaling event similar to the following example:

Autoscaler did not receive any metrics for memory during the scaling window. Scaling down will be deferred until these metrics are available.

You also see an error similar to the following example in the debug logs for Autoscaler:

[APP/PROC/WEB/2] OUT logcache walker: unexpected status code 502

To view autoscaling events for an app, see View Autoscaling Events in Using the App Autoscaler CLI. To review Autoscaler debug logs, see Review Autoscaler Debug Logs above.

Individual Log Cache Instance Fails

Log Cache holds the log and metric envelopes for an app on a single Log Cache instance. When that Log Cache instance fails, Autoscaler cannot retrieve those envelopes, so it cannot determine when to scale an app.

When the Log Cache instance that holds the envelopes for an app fails, you see an autoscaling event similar to the following example:

Autoscaler did not receive any metrics for memory during the scaling window. Scaling down will be deferred until these metrics are available.

You also see an error similar to the following example in the debug logs for Autoscaler:

[APP/PROC/WEB/1] OUT logcache walker: unexpected status code 503

To view autoscaling events for an app, see View Autoscaling Events in Using the App Autoscaler CLI. To review Autoscaler debug logs, see Review Autoscaler Debug Logs above.

Log Cache Instances are Unresponsive

When Log Cache instances do not respond to requests from Autoscaler, you see an autoscaling event similar to the following example:

Autoscaler did not receive any metrics for memory during the scaling window. Scaling down will be deferred until these metrics are available.

You also see an error similar to the following example in the debug logs for Autoscaler:

[APP/PROC/WEB/2] OUT logcache walker: Get "https://log-cache.sys.example.com/api/v1/read/85a6aff3-c739-4364-8df3-44f1aa50662b": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

To view autoscaling events for an app, see View Event History in Scaling an App Using App Autoscaler or View Autoscaling Events in Using the App Autoscaler CLI. To review Autoscaler debug logs, see Review Autoscaler Debug Logs above.

Cloud Controller

Autoscaler reads the app metadata that the Cloud Controller stores to determine how many instances an app currently has. When Autoscaler scales the number of app instances up or down, it sends a request to the Cloud Controller. Then, the Cloud Controller directs the Diego Cell that runs the app to stage the number of app instances that Autoscaler has requested.

Autoscaler also uses the Cloud Controller to retrieve the credentials stored in RabbitMQ service bindings. Autoscaler uses these credentials to read the metadata for RabbitMQ queues.

The Cloud Controller also authorizes Log Cache to provide log and metric envelopes to Autoscaler.

If you have configured Autoscaler to record debug logs, Autoscaler logs the following metrics:

  • autoscale.cloud_controller.calls: the number of requests that Autoscaler has made to the Cloud Controller

  • autoscale.cloud_controller.request_errors: the number of requests that Autoscaler has made to the Cloud Controller that resulted in an error

  • autoscale.cloud_controller.scale_errors: the number of requests to scale an app that have resulted in an error

Note: The metrics that are listed above appear as log lines and are not emitted to Loggregator.

To view these metrics, see Review Autoscaler Debug Logs above.

For more information about the Cloud Controller, see Cloud Controller.

Quota has been reached

When scaling would cause the application to exceed a quota, you see an autoscaling event similar to the following example:

Unable to scale. App instance quota has been reached

Cloud Controller API is Stopped

When the Cloud Controller API is stopped, you see an autoscaling event similar to the following example:

Unable to scale due to Cloud Controller error.

Note: To determine the cause of this error, see Autoscaler Shows Error “Unable to scale due to Cloud Controller error” in the VMware Tanzu Knowledge Base.

You also see an error similar to the following example in the debug logs for Autoscaler:

[APP/PROC/WEB/0] OUT level=error msg="[Cloud Controller - app summary] CloudController - route not found"`

To view autoscaling events for an app, see View Autoscaling Events in Using the App Autoscaler CLI. To review Autoscaler debug logs, see Review Autoscaler Debug Logs above.

Cloud Controller is Unresponsive

When the Cloud Controller does not respond to requests from Autoscaler or Log Cache, you see an autoscaling event similar to the following example:

Unable to scale due to Cloud Controller error.

Note: To determine the cause of this error, see Autoscaler Shows Error “Unable to scale due to Cloud Controller error” in the VMware Tanzu Knowledge Base.

You also see an error similar to the following example in the debug logs for Autoscaler:

[APP/PROC/WEB/0] OUT level=error msg="[Cloud Controller - app summary] failure connecting to cloud controller. Error: Get \"https://api.sys.example.com/v2/apps/85a6aff3-c739-4364-8df3-44f1aa50662b/summary\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
[APP/PROC/WEB/1] OUT level=error msg="[Cloud Controller - app summary] CloudController - Failure"

To view autoscaling events for an app, see View Autoscaling Events in Using the App Autoscaler CLI. To review Autoscaler debug logs, see Review Autoscaler Debug Logs above.

MySQL

Autoscaler uses a MySQL database to store persistent information about Autoscaler service bindings and the autoscaling rules you configure. If Autoscaler fails to select from or update the MySQL database, Autoscaler cannot scale your apps.

You see an error similar to the following example in the debug logs for Autoscaler:

[APP/PROC/WEB/0] OUT level=error msg="Unable to find any enabled service bindings by instance id" error="driver: bad connection" instance_id=0
[APP/PROC/WEB/2] ERR [mysql] packets.go:37: unexpected EOF

To review Autoscaler debug logs, see Review Autoscaler Debug Logs above.