Configuring Alerting

This topic explains how to configure alerting in Healthwatch.

Overview

In Healthwatch, you can configure Prometheus to send alerts to Alertmanager according to alerting rules you configure. Alertmanager then manages those alerts by removing duplicate alerts, grouping alerts together, and routing those groups to alert receiver integrations such as email, PagerDuty, or Slack. Alertmanager also silences and inhibits alerts according to the alerting rules you configure.

For more information, see Alertmanager in the Prometheus documentation.

Configure Alerting

In the Alertmanager Configuration pane, you configure alerting rules, routing rules, and alert receivers for Alertmanager to use.

The Alertmanager Configuration pane configures the Alertmanager configuration file. For more information, see Configuration in the Prometheus documentation.

To configure alerting through the Alertmanager Configuration pane:

  1. Navigate to the Ops Manager Installation Dashboard.

  2. Click the Healthwatch tile.

  3. Select Alertmanager Configuration.

  4. For Alerting Rules YAML, provide in YAML format the rule statements that define which alerts Alertmanager sends to your alert receivers:

    1. The following YAML files contain alerting rules for VMware Tanzu Application Service for VMs (TAS for VMs) and VMware Tanzu Kubernetes Grid Integrated Edition (TKGI). Choose the YAML file below that corresponds to your runtime and replace OPS_MANAGER_URL with the fully-qualified domain name (FQDN) of your Ops Manager deployment:
    2. Modify the YAML file according to the observability requirements for your foundation.
    3. Paste the contents of the YAML file into Alerting Rules YAML.

      For more information, see Alerting Rules in the Prometheus documentation.
  5. For Routing Rules YAML, provide the route block that defines where Alertmanager sends alerts, how frequently Alertmanager sends alerts, and how Alertmanager groups alerts together. This route block appears in the route section of the Alertmanager configuration file. You must define all route configuration parameters. For information about the parameters you must provide, see <route> in Configuration in the Prometheus documentation.

  6. (Optional) For Inhibit Rules YAML, provide in YAML format the rule statements that define which alerts Alertmanager does not send to your alert receivers. These rule statements appear in the inhibit_rules section of the Alertmanager configuration file. For more information, see <inhibit_rule> in Configuration in the Prometheus documentation.

  7. Under Email Receiver Configurations, configure the alert receivers that you specified in Routing Rules YAML in a previous step. For more information, see Configure Alert Receivers below.

Configure Alert Receivers

You can configure email, PagerDuty, Slack, and webhook alert receivers in the Healthwatch tile. These alert receiver configurations appear in the receivers section of the Alertmanager configuration file. For more information, see <receiver> in Configuration in the Prometheus documentation.

You can also configure custom alert receiver integrations that are not natively supported by Alertmanager through webhook receivers. For more information about configuring custom alert receiver integrations, see <webhook_config> in Configuration in the Prometheus documentation.

If you configure two or more alert receivers with the same name, Alertmanager merges them into a single alert receiver. For more information, see Combining Alert Receivers below.

The following sections describe how to configure each type of alert receiver:

Note: If you want to provide authentication and TLS communication settings for your alert receivers, you must provide them in the associated alert receiver configuration fields described in the sections below. If the base configuration YAML for your alert receivers include fields for authentication and TLS communication settings, do not include them when you provide the configuration YAML for your alert receivers in the Receiver Configuration fields.

Configure an Email Alert Receiver

To configure an email alert receiver:

  1. Under Email Receiver Configurations, click Add.

  2. For Receiver Name, enter the name you want to give your email receiver. The name you enter in this field must match the name you specified in the route block you entered in the Routing Rules YAML field in Configure Alerting above.

  3. For Receiver Configuration, provide the configuration YAML for your email receiver. Do not prefix the configuration YAML with a dash. For more information about the YAML structure for this field, see <email_config> in Configuration in the Prometheus documentation.

  4. (Optional) To configure SMTP authentication between Alertmanager and your email receiver, configure the following fields:

    1. For Authentication Password, enter your SMTP authentication password.
    2. For Authentication Secret, enter your SMTP authentication secret.
  5. (Optional) To configure TLS communication between Alertmanager and your email receiver, configure the following fields:

    1. For TLS Config Certificate Authority, provide a certificate authority (CA) that signs the certificates you provide in the TLS Config Certificate and Private Key field below.
    2. For TLS Config Certificate and Private Key, provide at least one certificate and private key to enable TLS communication between Alertmanager and your email receiver.
    3. For TLS Config Server Name, enter the name of the SMTP server that facilitates TLS communication between Alertmanager and your email receiver.
    4. If the certificate you provided in the TLS Config Certificate and Private Key field is signed by a self-signed CA or a certificate that is signed by a self-signed CA, enable the TLS Config Skip SSL Validation checkbox to skip SSL validation during TLS handshakes.

      For more information about configuring TLS communication, see <tls_config> in Configuration in the Prometheus documentation.

Configure a PagerDuty Alert Receiver

To configure a PagerDuty alert receiver:

  1. Under PagerDuty Receiver Configurations, click Add.

  2. For Receiver Name, enter the name you want to give your PagerDuty receiver. The name you enter in this field must match the name you specified in the route block you entered in the Routing Rules YAML field in Configure Alerting above.

  3. For Receiver Configuration, provide the configuration YAML for your PagerDuty receiver. Do not prefix the configuration YAML with a dash. For more information about the YAML structure for this field, see <pagerduty_config> in Configuration in the Prometheus documentation.

  4. Enter your PagerDuty integration key in one of the following fields:

    • If you selected Events API v2 as your integration type in PagerDuty, enter your PagerDuty integration key in Routing Key.
    • If you selected Prometheus as your integration type in PagerDuty, enter your PagerDuty integration key in Service Key.
  5. (Optional) To configure the HTTP client that your PagerDuty receiver uses to communicate with HTTP-based API services, configure one of the following options:

    • To configure your HTTP client to authenticate API services using basic authentication, enter the username and either the password or the password file associated with your HTTP client in Basic Authorization.
    • To configure your HTTP client to authenticate API services using a bearer token or bearer token file, enter either the bearer token or the filepath for the bearer token file associated with your HTTP client in Bearer Token.

      For more information about configuring an HTTP client, see <http_config> in Configuration in the Prometheus documentation.
  6. (Optional) To configure TLS communication between Alertmanager and your PagerDuty receiver, configure the following fields:

    1. For TLS Config Certificate Authority, provide a CA that signs the certificates you provide in the TLS Config Certificate and Private Key field below.
    2. For TLS Config Certificate and Private Key, provide at least one certificate and private key to enable TLS communication between Alertmanager and your PagerDuty receiver.
    3. For TLS Config Server Name, enter the name of the PagerDuty server that facilitates TLS communication between Alertmanager and your PagerDuty receiver.
    4. If the certificate you provided in the TLS Config Certificate and Private Key field is signed by a self-signed CA or a certificate that is signed by a self-signed CA, enable the TLS Config Skip SSL Validation checkbox to skip SSL validation during TLS handshakes.

      For more information about configuring TLS communication, see <tls_config> in Configuration in the Prometheus documentation.

Configure a Slack Alert Receiver

To configure a Slack alert receiver:

  1. Under Slack Receiver Configurations, click Add.

  2. For Receiver Name, enter the name you want to give your Slack receiver. The name you enter in this field must match the name you specified in the route block you entered in the Routing Rules YAML field in Configure Alerting above.

  3. For Receiver Configuration, provide the configuration YAML for your Slack receiver. Do not prefix the configuration YAML with a dash. For more information about the YAML structure for this field, see <slack_config> in Configuration in the Prometheus documentation.

  4. For Slack API URL, enter the Slack webhook URL for your Slack receiver.

  5. (Optional) To configure the HTTP client that your Slack receiver uses to communicate with HTTP-based API services, configure one of the following options:

    • To configure your HTTP client to authenticate API services using basic authentication, enter the username and either the password or the password file associated with your HTTP client in Basic Authorization.
    • To configure your HTTP client to authenticate API services using a bearer token or bearer token file, enter either the bearer token or the filepath for the bearer token file associated with your HTTP client in Bearer Token.

      For more information about configuring an HTTP client, see <http_config> in Configuration in the Prometheus documentation.
  6. (Optional) To configure TLS communication between Alertmanager and your Slack receiver, configure the following fields:

    1. For TLS Config Certificate Authority, provide a CA that signs the certificates you provide in the TLS Config Certificate and Private Key field below.
    2. For TLS Config Certificate and Private Key, provide at least one certificate and private key to enable TLS communication between Alertmanager and your Slack receiver.
    3. For TLS Config Server Name, enter the name of the Slack server that facilitates TLS communication between Alertmanager and your Slack receiver.
    4. If the certificate you provided in the TLS Config Certificate and Private Key field is signed by a self-signed CA or a certificate that is signed by a self-signed CA, enable the TLS Config Skip SSL Validation checkbox to skip SSL validation during TLS handshakes.

      For more information about configuring TLS communication, see <tls_config> in Configuration in the Prometheus documentation.

Configure a Webhook Alert Receiver

To configure a webhook alert receiver:

  1. Under Webhook Receiver Configurations, click Add.

  2. For Receiver Name, enter the name you want to give your webhook receiver. The name you enter in this field must match the name you specified in the route block you entered in the Routing Rules YAML field in Configure Alerting above.

  3. For Receiver Configuration, provide the configuration YAML for your webhook receiver. Do not prefix the configuration YAML with a dash. For more information about the YAML structure for this field, see <webhook_config> in Configuration in the Prometheus documentation.

  4. (Optional) To configure the HTTP client that your webhook receiver uses to communicate with HTTP-based API services, configure one of the following options:

    • To configure your HTTP client to authenticate API services using basic authentication, enter the username and either the password or the password file associated with your HTTP client in Basic Authorization.
    • To configure your HTTP client to authenticate API services using a bearer token or bearer token file, enter either the bearer token or the filepath for the bearer token file associated with your HTTP client in Bearer Token.

      For more information about configuring an HTTP client, see <http_config> in Configuration in the Prometheus documentation.
  5. (Optional) To configure TLS communication between Alertmanager and your webhook receiver, configure the following fields:

    1. For TLS Config Certificate Authority, provide a CA that signs the certificates you provide in the TLS Config Certificate and Private Key field below.
    2. For TLS Config Certificate and Private Key, provide at least one certificate and private key to enable TLS communication between Alertmanager and your webhook receiver.
    3. For TLS Config Server Name, enter the name of the webhook server that facilitates TLS communication between Alertmanager and your webhook receiver.
    4. If the certificate you provided in the TLS Config Certificate and Private Key field is signed by a self-signed CA or a certificate that is signed by a self-signed CA, enable the TLS Config Skip SSL Validation checkbox to skip SSL validation during TLS handshakes.

      For more information about configuring TLS communication, see <tls_config> in Configuration in the Prometheus documentation.
  6. Click Save.

Combining Alert Receivers

If you configure two or more alert receivers with the same name, Alertmanager merges them into a single alert receiver. For example, if you configure:

  • Two email receivers named “Foundation” with distinct email addresses
  • One PagerDuty receiver named “Foundation”
  • One email receiver named “Clusters”

Then Alertmanager merges them into the following alert receivers:

  • One alert receiver named “Foundation” containing two email configurations and a PagerDuty configuration

  • One alert receiver named “Clusters” containing one email configuration

The example below shows how Alertmanager combines the alert receivers described above in its configuration file:

receivers:
- name: 'Foundation'
  email_configs:
  - to: 'operator1@example.org'
    from: global.smtp_from
    smarthost: global.smtp_smarthost
    hello: global.smtp_hello
    html: '{{ template "email.default.html" . }}'
    text: "This is an alert."
  - to: 'operator2@example.org'
    from: global.smtp_from
    smarthost: global.smtp_smarthost
    hello: global.smtp_hello
    html: '{{ template "email.default.html" . }}'
    text: "This is an alert."
  pagerduty_configs:
  - service_key: operator-1-key
    url: global.pagerduty_url
    client: '{{ template "pagerduty.default.client" . }}'
    client_url: '{{ template "pagerduty.default.clientURL" . }}'
    description: '{{ template "pagerduty.default.description" .}}'
    severity: 'error'

- name: 'Clusters'
  email_configs:
  - to: 'operator1@example.org'
    from: global.smtp_from
    smarthost: global.smtp_smarthost
    hello: global.smtp_hello
    html: '{{ template "email.default.html" . }}'
    text: "This is an alert."

Silence Alerts

Alertmanager includes a command-line tool called amtool. You can use amtool to temporarily silence Alertmanager alerts without modifying your alerting rules. For more information about how to use amtool, see the Alertmanager documentation in the Prometheus repository on GitHub.

You can also use the Alertmanager UI to view and silence alerts. To access the Alertmanager UI, see Viewing the Alertmanager UI in Troubleshooting Healthwatch.

To silence alerts using amtool:

  1. SSH into one of the Prometheus VMs deployed by the Healthwatch tile. Alertmanager replicates any changes you make in one Prometheus VM to all other Prometheus VMs. To SSH into one of the Prometheus VMs, see BOSH SSH in Advanced Troubleshooting with the BOSH CLI.

  2. Navigate to the amtool directory by running:

    cd /var/vcap/jobs/alertmanager/packages/alertmanager/bin
    
  3. View all of your currently running alerts by running:

    amtool -o extended alert
    

    This command returns a list of all currently running alerts that includes detailed information about each alert, including the name of the alert and the Prometheus instance on which it runs.

    You can also query the list of alerts by name and instance to view specific alerts.

    • To query alerts by name, run:

      amtool -o extended alert query alertname="ALERT-NAME"
      

      Where ALERT-NAME is the name of the alert you want to silence. You can query the exact name of the alert, or you can query a partial name and include the regular expression .* to see all alerts that include the partial name, such as in the following example:

      amtool -o extended alert query alertname=~"Test.*"
      
    • To query alerts by instance, run:

      amtool -o extended alert query instance=~".+INSTANCE-NUMBER"
      

      Where INSTANCE-NUMBER is the number of the Prometheus instance for which you want to silence alerts.

    • To query alerts by name and instance, run:

      amtool -o extended alert query alertname=~"ALERT-NAME" instance=~".+INSTANCE-NUMBER"
      

      Where:

      • ALERT-NAME is the name of the alert you want to silence.
      • INSTANCE-NUMBER is the number of the Prometheus instance for which you want to silence an alert.
  4. Run one of the following commands to silence either a specific alert or all alerts for a specified amount of time:

    • To silence a specific alert for a specified amount of time, run:

      amtool silence add alertname=ALERT-NAME instance=~".+INSTANCE-NUMBER"
      

      Where:

      • ALERT-NAME is the name of the alert you want to silence.
      • INSTANCE-NUMBER is the number of the Prometheus instance for which you want to silence an alert.
    • To silence all alerts for a specified amount of time, run:

      amtool silence add 'alertname=~.+' -d TIME-TO-SILENCE -c 'COMMENT' --alertmanager.url http://localhost:10401
      

      Where:

      • TIME-TO-SILENCE is the amount of time in minutes or hours you want to silence alerts. For example, 30m or 4h.
      • COMMENT is any notes about this silence you want to add.

        Note:~.+ is a regular expression that includes all alerts in the silence you set.

  5. Record the ID string from the output. You can use this ID to end the silence early. For more information, run amtool --help or see the Alertmanager documentation in the Prometheus repository on GitHub.