Streaming Application Logs to Azure OMS Log Analytics (Beta)

Page last updated:

WARNING: The OMS Log Analytics Firehose Nozzle is currently intended for evaluation and test purposes only. Do not use this product in a production environment.

This topic explains how to integrate your Cloud Foundry (CF) apps with OMS Log Analytics.

Operations Management Suite (OMS) Log Analytics is a monitoring service for Microsoft Azure. The OMS Log Analytics Firehose Nozzle is a CF component that forwards metrics from the Loggregator Firehose to OMS Log Analytics.

This topic assumes you are using the latest version of the Cloud Foundry Command Line Interface (cf CLI) and a working Elastic Runtime deployment on Azure. See Preparing to Deploy PCF on Azure for more information.

Step 1: Create an OMS Workspace in Azure

See Get started with Log Analytics in the Microsoft Azure documentation to create an OMS workspace.

Step 2: Deploy the Nozzle to Cloud Foundry

  1. Run cf login -a https://api.YOUR-DOMAIN -u YOUR-USERNAME --skip-ssl-validation, replacing YOUR-DOMAIN with your domain and YOUR-USERNAME with your CF username, to authenticate to your CF instance. For example:

    $ cf login -a https://api.example.com -u admin --skip-ssl-validation

  2. Follow the steps below to create a new Cloud Foundry user and grant it access to the Loggregator Firehose using the UAA CLI (UAAC). For more information, see Creating and Managing Users with the UAA CLI (UAAC) and Orgs, Spaces, Roles, and Permissions.

    1. Use uaac target uaa.YOUR-DOMAIN to target your UAA server:
      $ uaac target uaa.example.com --skip-ssl-validation
    2. Run the following command to obtain an access token for the admin client:
      $ uaac token client get admin
    3. Run cf create-user USERNAME PASSWORD, replacing USERNAME with a new username and PASSWORD with a password, to create a new user. For example:
      $ cf create-user firehose-user firehose-password
    4. Run uaac member add cloud_controller.admin USERNAME, replacing USERNAME with the new username, to grant the new user admin permissions. For example:
      $ uaac member add cloud_controller.admin firehose-user
    5. Run uaac member add doppler.firehose USERNAME, replacing USERNAME with the new username, to grant the new user permission to read logs from the Loggregator Firehose endpoint. For example:
      $ uaac member add doppler.firehose firehose-user
  3. Download the OMS Log Analytics Firehose Nozzle BOSH release from Github. Clone the repository and navigate to the oms-log-analytics-firehose-nozzle directory:

    $ git clone https://github.com/Azure/oms-log-analytics-firehose-nozzle.git
    $ cd oms-log-analytics-firehose-nozzle
    
  4. Set the following environment variables in the OMS Log Analytics Firehose Nozzle manifest:

    Environment Variable Description
        applications:
        - name: oms_nozzle
        ...
        env:
          OMS_WORKSPACE: YOUR-WORKSPACE-ID
          OMS_KEY: YOUR-OMS-KEY
    Enter the ID and key value for your OMS workspace.
        OMS_POST_TIMEOUT: 10s
    (Optional) Set the HTTP post timeout for sending events to OMS Log Analytics. The default value is 10 seconds.
        OMS_BATCH_TIME: 10s
    (Optional) Set the interval for posting a batch to OMS. The default value is 10 seconds.

    For more information, see the Configure Additional Logging section below.
        OMS_MAX_MSG_NUM_PER_BATCH: 1000
    (Optional) Set the maximum number of messages to include in an OMS batch. The default amount is 1000.

    For more information, see the Configure Additional Logging section below.
        FIREHOSE_USER: YOUR-FIREHOSE-USER
        FIREHOSE_USER_PASSWORD: YOUR-FIREHOSE-PASSWORD
    Enter the username and password for the Firehose user you created in Step 2c.
        API_ADDR: https://api.YOUR-DOMAIN
    Enter the URL of your API endpoint.
        DOPPLER_ADDR: wss://doppler.YOUR-DOMAIN:443
    Enter the URL of your Loggregator traffic controller endpoint.
        EVENT_FILTER: YOUR-LIST
    (Optional) Enter the event types you want to filter out in a comma-separated list. The valid event types are METRIC, LOG, and HTTP.
        IDLE_TIMEOUT: 60s
    (Optional) Set the duration for the Firehose keepalive connection. The default time is 60 seconds.
        SKIP_SSL_VALIDATION: TRUE-OR-FALSE
    Set this value to TRUE to allow insecure connections to the UAA and the traffic controller. To block insecure connections to the UAA and traffic controller, set this value to FALSE.
        LOG_LEVEL: INFO
    (Optional) Change this value to increase or decrease the amount of logs. Valid log levels in increasing order include INFO, ERROR, and DEBUG. The default value is INFO.
        LOG_EVENT_COUNT: TRUE-OR-FALSE
    Set this value to TRUE to log the total count of events that the nozzle has sent, received, and lost. OMS logs this value as CounterEvents.

    For more information, see the Configure Additional Logging section below.
        LOG_EVENT_COUNT_INTERVAL: 60s
    (Optional) Set the time interval for logging the event count to OMS. The default interval is 60 seconds.

    For more information, see the Configure Additional Logging section below.
  5. Push the app:

    $ cf push
    

Step 3: View Logs in OMS Portal

Import the Cloud Foundry OMS view to your OMS Portal to view visualized logs and metrics. You can also create alert rules for specific events.

Note: The OMS view of Cloud Foundry is not yet available in the OMS Solutions Gallery. You can add it manually to view your logs in OMS Portal.

Import the OMS View

  1. From the main OMS Overview page, navigate to View Designer.
  2. Click Import.
  3. Click Browse.
  4. Select the Cloud Foundry (Preview).omsview file.
  5. Save the view. The main OMS Overview page displays the Tile.
  6. Click the Tile to view visualized metrics.

See the OMS Log Analytics View Designer documentation for more information.

Create Alert Rules

See Understanding alerts in Log Analytics for more information about OMS Log Analytics alerts.

Set Alert Queries

This section includes example queries that operators can set in the OMS Portal.

  • The following query alerts the operator when the nozzle sends a slowConsumerAlert to OMS:

    Type=CF_ValueMetric_CL Name_s=slowConsumerAlert
    
  • The following query alerts the operator when Loggregator sends an LGR to indicate problems with the logging process:

    Type=CF_LogMessage_CL SourceType_s=LGR MessageType_s=ERR
    
  • The following query alerts the operator when the number of lost events reaches a certain threshold, specified in the OMS Portal:

    Type=CF_CounterEvent_CL Job_s=nozzle Name_s=eventsLost
    
  • The following query alerts the operator when the nozzle receives the TruncatingBuffer.DroppedMessages CounterEvent:

    Type=CF_CounterEvent_CL Name_s="TruncatingBuffer.DroppedMessages"
    

(Optional) Step 4: Configure Additional Logging

OMS Log Analytics Firehose Nozzle forwards metrics from the Loggregator Firehose to OMS with minimal processing, but the nozzle can push additional metrics to OMS.

Log Sent, Received, and Lost Events

If you set the LOG_EVENT_COUNT environment variable to TRUE in the manifest, the nozzle periodically sends the count of sent, received, and lost events to OMS. The value you set for the LOG_EVENT_COUNT_INTERVAL determines how frequently the nozzle sends the count.

Note: The nozzle does not count CounterEvents themselves in the sent, received, or lost event count.

The nozzle sends the count as a CounterEvent with a CounterKey of one of the following:

CounterEvent CounterKey
nozzle.stats.eventsReceived The number of events the Firehose has received during the interval
nozzle.stats.eventsSent The number of events the nozzle has successfully sent to OMS during the interval
nozzle.stats.eventsLost The number of events the nozzle has tried to send to OMS during the interval, but failed to send after 4 attempts

In most cases, the total count of eventsSent plus eventsLost is less than the total eventsReceived at the same time. The nozzle buffers some messages and posts them in a batch to OMS. Operators can adjust the buffer size by adjusting the OMS_BATCH_TIME and OMS_MAX_MSG_NUM_PER_BATCH environment variables in the manifest.

Log Slow Consumer Alerts

Note: The nozzle does not count ValueMetrics in the sent, received, or lost event count.

Loggregator sends the nozzle a slowConsumerAlert in the following situations:

  • WebSocket sends the error code ClosePolicyViolation (1008)
  • The nozzle receives a CounterEvent with the value TruncatingBuffer.DroppedMessages

In either case, the nozzle sends the slowConsumerAlert event to OMS as the following ValueMetric:

ValueMetric MetricKey
nozzle.alert.slowConsumerAlert 1

See the Slow Nozzle Alerts section of the Loggregator Guide for Cloud Foundry Operators for more information.

(Optional) Step 5: Scale the Deployment

Scale the Nozzle

If the nozzle is unable to keep up with processing logs from the Firehose, Loggregator alerts the nozzle. When the nozzle receives the alert, it sends a slowConsumerAlert to OMS. If this happens, scaling up the nozzle minimizes data loss.

If an operator chooses to scale up their deployment, the Firehose evenly distributes events across all instances of the nozzle. See the Scaling Nozzles section of the Loggregator Guide for Cloud Foundry Operators for more information.

Operators can create an alert rule for the slowConsumerAlert message.

Scale Loggregator

Loggregator sends LGR log messages to indicate problems with the logging process. See the Scaling Loggregator section of the Loggregator Guide for Cloud Foundry Operators for more information.

Operators can create an alert rule for the LGR message.

Create a pull request or raise an issue on the source for this page in GitHub