Configuring Alerting

Page last updated:

This topic explains how to configure alerting in Healthwatch for VMware Tanzu.

Overview

In Healthwatch, you can configure the Prometheus instance to send alerts to Alertmanager according to alerting rules you configure. Alertmanager then manages those alerts by removing duplicate alerts, grouping alerts together, and routing those groups to alert receiver integrations such as email, PagerDuty, or Slack. Alertmanager also silences and inhibits alerts according to the alerting rules you configure.

For more information, see the Prometheus documentation.

Configure Alerting

In the Alertmanager pane, you configure alerting rules, routing rules, and alert receivers for Alertmanager to use.

The values that you configure in the Alertmanager pane also configure their corresponding properties in the Alertmanager configuration file. For more information, see Overview of Configuration Files in Healthwatch in Configuration File Reference Guide, Configuring the Alertmanager Configuration File in Configuration File Reference Guide, and the Prometheus documentation.

To configure alerting through the Alertmanager pane:

  1. Navigate to the Ops Manager Installation Dashboard.

  2. Click the Healthwatch tile.

  3. Select Alertmanager.

  4. For Alerting rules, provide in YAML format the rule statements that define which alerts Alertmanager sends to your alert receivers:

    1. The following YAML files contain alerting rules for VMware Tanzu Application Service for VMs (TAS for VMs) and VMware Tanzu Kubernetes Grid Integrated Edition (TKGI). Choose the YAML file below that corresponds to your runtime and replace OPS_MANAGER_URL with the fully-qualified domain name (FQDN) of your Ops Manager deployment:
    2. Modify the YAML file according to the observability requirements for your Ops Manager foundation.
    3. Paste the contents of the YAML file into Alerting rules.

      For more information, see the Prometheus documentation.
  5. For Routing rules, provide in YAML format the route block that defines where Alertmanager sends alerts, how frequently Alertmanager sends alerts, and how Alertmanager groups alerts together. The following example shows a possible set of routing rules:

    receiver: 'example-receiver'
    group_wait: 30s
    group_interval: 5m
    repeat_interval: 4h
    group_by: [cluster, alertname]
    

    Note: group_by gathers all alerts with the same label into a single alert. For example, including cluster in the group_by property groups together all alerts from the same cluster. You can see the labels for the metrics that Healthwatch collects, such as cluster, index, deployment, and origin, within the braces at the end of each metrics.

    You must define all route configuration parameters. For more information about the parameters you must provide, see the Prometheus documentation.

  6. (Optional) For Inhibit rules, provide in YAML format the rule statements that define which alerts Alertmanager does not send to your alert receivers. For more information, see the Prometheus documentation.

  7. Configure the alert receivers that you specified in Routing rules in a previous step. For more information, see Configure Alert Receivers below.

Configure Alert Receivers

You can configure email, PagerDuty, Slack, and webhook alert receivers in the Healthwatch tile. For more information, see the Prometheus documentation.

You can also configure custom alert receiver integrations that are not natively supported by Alertmanager through webhook receivers. For more information about configuring custom alert receiver integrations, see the Prometheus documentation.

If you configure two or more alert receivers with the same name, Alertmanager merges them into a single alert receiver. For more information, see Combining Alert Receivers below.

The following sections describe how to configure each type of alert receiver:

Note: If you want to provide authentication and TLS communication settings for your alert receivers, you must provide them in the associated alert receiver configuration fields described in the sections below. If the base configuration YAML for your alert receivers include fields for authentication and TLS communication settings, do not include them when you provide the configuration YAML for your alert receivers in the Alert receiver configuration parameters fields.

Configure an Email Alert Receiver

To configure an email alert receiver:

  1. Under Email alert receivers, click Add.

  2. For Alert receiver name, enter the name you want to give your email alert receiver. The name you enter in this field must match the name you specified in the route block you entered in the Routing rules field in Configure Alerting above.

  3. For Alert receiver configuration parameters, provide the configuration parameters for your email alert receiver in YAML format. Do not prefix the YAML with a dash. The following example shows a possible set of configuration parameters:

    to: 'operator1@example.org'
    from: example.healthwatch.foundation.com
    smarthost: smtp.example.org:587
    headers:  { subject: "[ALERT] - [{{ .ExampleLabels.severity }}]  - {{ .ExampleAnnotations.summary }}"  }
    html: '{{ template "email.example.html" . }}'
    text: "This is an alert."
    

    At minimum, your configuration parameters must include the to, from, and smarthost properties. The other properties you must include depend on both the SMTP server for which you are configuring an alert receiver and the needs of your Ops Manager foundation. For more information about the properties you can include in this configuration, see the Prometheus documentation.

    Notes:
    • If you include the html property and leave it blank, Healthwatch automatically populates it with a default template.
    • Do not include the auth_password property, the auth_secret property, or the <tls_config> section in the configuration parameters for your email alert receiver. You can configure these properties in the next steps of this procedure.

  4. (Optional) To configure SMTP authentication between Alertmanager and your email alert receiver, configure one of the following fields:

    • If your SMTP server uses basic authentication, enter the authentication password for your SMTP server in SMTP server authentication password.
    • If your SMTP server uses CRAM_MD5 authentication, enter the authentication secret for your SMTP server in SMTP server authentication secret.
  5. (Optional) To allow Alertmanager to communicate with your email alert receiver over TLS, configure the following fields:

    1. For Certificate and private key for TLS, provide a certificate and private key for Alertmanager to use for TLS connections to your SMTP server.
    2. For CA certificate for TLS, provide a certificate for the certificate authority (CA) that your SMTP server uses to verify TLS certificates.
    3. For SMTP server name, enter the name of the SMTP server as it appears on the server’s TLS certificate.
    4. If the certificate you provided in Certificate and private key for TLS is signed by a self-signed CA certificate or a certificate that is signed by a self-signed CA certificate, activate the Skip TLS certificate verification checkbox. When this checkbox is activated, Alertmanager does not verify the identity of your SMTP server. This checkbox is deactivated by default.

      For more information about configuring TLS communication for Alertmanager, see the Prometheus documentation.

Configure a PagerDuty Alert Receiver

To configure a PagerDuty alert receiver:

  1. Under PagerDuty alert receivers, click Add.

  2. For Alert receiver name, enter the name you want to give your PagerDuty alert receiver. The name you enter in this field must match the name you specified in the route block you entered in the Routing rules field in Configure Alerting above.

  3. For Alert receiver configuration parameters, provide the configuration parameters for your PagerDuty alert receiver in YAML format. Do not prefix the YAML with a dash. The following example shows a possible set of configuration parameters:

    url: https://api.pagerduty.com/api/v2/alerts
    client: '{{ template "pagerduty.example.client" . }}'
    client_url: '{{ template "pagerduty.example.clientURL" . }}'
    description: '{{ template "pagerduty.example.description" .}}'
    severity: 'error'
    

    The properties you must include depend on both the PagerDuty instance for which you are configuring an alert receiver and the needs of your Ops Manager foundation. For more information about the properties you can include in this configuration, see the Prometheus documentation.

    Note: Do not include the routing_key property, the service_key property, the <http_config> section, or the <tls_config> section in the configuration parameters for your PagerDuty alert receiver. You can configure these properties in the next steps of this procedure.

  4. Enter your PagerDuty integration key in one of the following fields:

    • If you selected Events API v2 as your integration type in PagerDuty, enter your PagerDuty integration key in Routing key.
    • If you selected Prometheus as your integration type in PagerDuty, enter your PagerDuty integration key in Service key.
  5. (Optional) To configure an HTTP client for Alertmanager to use to communicate with the PagerDuty API, configure one of the following options:

    • To configure the HTTP client to authenticate the PagerDuty API using basic authentication, enter the username and password associated with the HTTP client in Basic authentication credentials.
    • To configure the HTTP client to authenticate the PagerDuty API using a bearer token, enter the bearer token associated with the HTTP client in Bearer token.

      For more information about configuring an HTTP client for Alertmanager, see the Prometheus documentation.
  6. (Optional) To allow Alertmanager to communicate with your PagerDuty alert receiver over TLS, configure the following fields:

    1. For Certificate and private key for TLS, provide a certificate and private key for Alertmanager to use for TLS connections to the PagerDuty API server.
    2. For CA certificate for TLS, provide a certificate for the CA that the PagerDuty API server uses to verify TLS certificates.
    3. For PagerDuty server name, enter the name of the PagerDuty API server as it appears on the server’s TLS certificate.
    4. If the certificate you provided in Certificate and private key for TLS is signed by a self-signed CA certificate or a certificate that is signed by a self-signed CA certificate, activate the Skip TLS certificate verification checkbox. When this checkbox is activated, Alertmanager does not verify the identity of the PagerDuty API server. This checkbox is deactivated by default.

      For more information about configuring TLS communication for Alertmanager, see the Prometheus documentation.

Configure a Slack Alert Receiver

To configure a Slack alert receiver:

  1. Under Slack alert receivers, click Add.

  2. For Alert receiver name, enter the name you want to give your Slack alert receiver. The name you enter in this field must match the name you specified in the route block you entered in the Routing rules field in Configure Alerting above.

  3. For Alert receiver configuration parameters, provide the configuration parameters for your Slack alert receiver in YAML format. Do not prefix the YAML with a dash. The following example shows a possible set of configuration parameters:

    channel: '#operators'
    username: 'Example Alerting Integration'
    

    The properties you must include depend on both the Slack instance for which you are configuring an alert receiver and the needs of your Ops Manager foundation. For more information about the properties you can include in this configuration, see the see the Prometheus documentation.

    Note: Do not include the api_url property, the api_url_file property, the <http_config> section, or the <tls_config> section in the configuration parameters for your Slack alert receiver. You can configure these properties in the next steps of this procedure.

  4. For Slack API URL, enter the webhook URL for your Slack instance from your Slack app directory.

  5. (Optional) To configure an HTTP client for Alertmanager to use to communicate with the server for your Slack instance, configure one of the following options:

    • To configure the HTTP client to authenticate the server for your Slack instance using basic authentication, enter the username and password associated with the HTTP client in Basic authentication credentials.
    • To configure the HTTP client to authenticate the server for your Slack instance using a bearer token, enter the bearer token associated with the HTTP client in Bearer token.

      For more information about configuring an HTTP client for Alertmanager, see the Prometheus documentation.
  6. (Optional) To allow Alertmanager to communicate with your Slack alert receiver over TLS, configure the following fields:

    1. For Certificate and private key for TLS, provide a certificate and private key for Alertmanager to use for TLS connections to the server for your Slack instance.
    2. For CA certificate for TLS, provide a certificate for the CA that the server for your Slack instance uses to verify TLS certificates.
    3. For Slack server name, enter the name of the server for your Slack instance as it appears on the server’s TLS certificate.
    4. If the certificate you provided in Certificate and private key for TLS is signed by a self-signed CA certificate or a certificate that is signed by a self-signed CA certificate, activate the Skip TLS certificate verification checkbox. When this checkbox is activated, Alertmanager does not verify the identity of the server for your Slack instance. This checkbox is deactivated by default.

      For more information about configuring TLS communication for Alertmanager, see the Prometheus documentation.

Configure a Webhook Alert Receiver

To configure a webhook alert receiver:

  1. Under Webhook alert receivers, click Add.

  2. For Alert receiver name, enter the name you want to give your webhook alert receiver. The name you enter in this field must match the name you specified in the route block you entered in the Routing rules field in Configure Alerting above.

  3. For Alert receiver configuration parameters, provide the configuration parameters for your webhook alert receiver in YAML format. Do not prefix the YAML with a dash. The following example shows a possible set of configuration parameters:

    url: https://example.com/data/12345
    max_alerts: 0
    

    The properties you must include depend on both the webhook for which you are configuring an alert receiver and the needs of your Ops Manager foundation. For more information about the properties you can include in this configuration, see the Prometheus documentation.

    Notes:
    • Do not include the <http_config> section or the <tls_config> section in the configuration parameters for your webhook alert receiver. You can configure these properties in the next steps of this procedure.
    • You can also configure custom alert receiver integrations that are not natively supported by Alertmanager through webhook alert receivers. For more information about configuring custom alert receiver integrations, see the Prometheus documentation.

  4. (Optional) To configure an HTTP client for Alertmanager to use to communicate with the server that processes your webhook, configure one of the following options:

    • To configure the HTTP client to authenticate the server that processes your webhook using basic authentication, enter the username and password associated with the HTTP client in Basic authentication credentials.
    • To configure the HTTP client to authenticate the server that processes your webhook using a bearer token, enter the bearer token associated with the HTTP client in Bearer token.

      For more information about configuring an HTTP client for Alermanager, see the Prometheus documentation.
  5. (Optional) To allow Alertmanager to communicate with your webhook alert receiver over TLS, configure the following fields:

    1. For Certificate and private key for TLS, provide a certificate and private key for Alertmanager to use for TLS connections to the server that processes your webhook.
    2. For CA certificate for TLS, provide a certificate for the CA that the server that processes your webhook uses to verify TLS certificates.
    3. For Webhook server name, enter the name of the server that processes your webhook as it appears on the server’s TLS certificate.
    4. If the certificate you provided in Certificate and private key for TLS is signed by a self-signed CA certificate or a certificate that is signed by a self-signed CA certificate, activate the Skip TLS certificate verification checkbox. When this checkbox is activated, Alertmanager does not verify the identity of the server that processes your webhook. This checkbox is deactivated by default.

      For more information about configuring TLS communication for Alertmanager, see the Prometheus documentation.
  6. Click Save.

Combining Alert Receivers

If you configure two or more alert receivers with the same name, Alertmanager merges them into a single alert receiver. For example, if you configure:

  • Two email receivers named “Foundation” with distinct email addresses

  • One PagerDuty receiver named “Foundation”

  • One email receiver named “Clusters”

Then Alertmanager merges them into the following alert receivers:

  • One alert receiver named “Foundation” containing two email configurations and a PagerDuty configuration

  • One alert receiver named “Clusters” containing one email configuration

The example below shows how Alertmanager combines the alert receivers described above in its configuration file:

receivers:
- name: 'Foundation'
  email_configs:
  - to: 'operator1@example.org'
    from: example.healthwatch.foundation.com
    smarthost: smtp.example.org:587
    headers:  { subject: "[ALERT] - [{{ .ExampleLabels.severity }}]  - {{ .ExampleAnnotations.summary }}"  }
    html: '{{ template "email.example.html" . }}'
    text: "This is an alert."
  - to: 'operator2@example.org'
    from: example.healthwatch.foundation.com
    smarthost: smtp.example.org:587
    headers:  { subject: "[ALERT] - [{{ .ExampleLabels.severity }}]  - {{ .ExampleAnnotations.summary }}"  }
    html: '{{ template "email.example.html" . }}'
    text: "This is an alert."
  pagerduty_configs:
  - url: https://api.pagerduty.com/api/v2/alerts
    client: '{{ template "pagerduty.example.client" . }}'
    client_url: '{{ template "pagerduty.example.clientURL" . }}'
    description: '{{ template "pagerduty.example.description" .}}'
    severity: 'error'

- name: 'Clusters'
  email_configs:
  - to: 'operator1@example.org'
    from: example.healthwatch.foundation.com
    smarthost: smtp.example.org:587
    headers:  { subject: "[ALERT] - [{{ .ExampleLabels.severity }}]  - {{ .ExampleAnnotations.summary }}"  }
    html: '{{ template "email.example.html" . }}'
    text: "This is an alert."

Silence Alerts

Alertmanager includes a command-line tool called amtool. You can use amtool to temporarily silence Alertmanager alerts without modifying your alerting rules. For more information about how to use amtool, see the Alertmanager documentation in the Prometheus repository on GitHub.

You can also use the Alertmanager UI to view and silence alerts. To access the Alertmanager UI, see Viewing the Alertmanager UI in Troubleshooting Healthwatch.

To silence alerts using amtool:

  1. SSH into one of the Prometheus VMs deployed by the Healthwatch tile. Alertmanager replicates any changes you make in one Prometheus VM to all other Prometheus VMs. To SSH into one of the Prometheus VMs, see the Ops Manager documentation.

  2. Navigate to the amtool directory by running:

    cd /var/vcap/jobs/alertmanager/packages/alertmanager/bin
    
  3. View all of your currently running alerts by running:

    amtool -o extended alert --alertmanager.url http://localhost:10401
    

    This command returns a list of all currently running alerts that includes detailed information about each alert, including the name of the alert and the Prometheus instance on which it runs.

    You can also query the list of alerts by name and instance to view specific alerts.

    • To query alerts by name, run:

      amtool -o extended alert query alertname="ALERT-NAME" --alertmanager.url http://localhost:10401
      

      Where ALERT-NAME is the name of the alert you want to silence. You can query the exact name of the alert, or you can query a partial name and include the regular expression .* to see all alerts that include the partial name, such as in the following example:

      amtool -o extended alert query alertname=~"Test.*" --alertmanager.url http://localhost:10401
      
    • To query alerts by instance, run:

      amtool -o extended alert query instance=~".+INSTANCE-NUMBER" --alertmanager.url http://localhost:10401
      

      Where INSTANCE-NUMBER is the number of the Prometheus instance for which you want to silence alerts.

    • To query alerts by name and instance, run:

      amtool -o extended alert query alertname=~"ALERT-NAME" instance=~".+INSTANCE-NUMBER" --alertmanager.url http://localhost:10401
      

      Where:

      • ALERT-NAME is the name of the alert you want to silence.
      • INSTANCE-NUMBER is the number of the Prometheus instance for which you want to silence an alert.
  4. Run one of the following commands to silence either a specific alert or all alerts for a specified amount of time:

    • To silence a specific alert for a specified amount of time, run:

      amtool silence add alertname=ALERT-NAME instance=~".+INSTANCE-NUMBER" --alertmanager.url http://localhost:10401
      

      Where:

      • ALERT-NAME is the name of the alert you want to silence.
      • INSTANCE-NUMBER is the number of the Prometheus instance for which you want to silence an alert.
    • To silence all alerts for a specified amount of time, run:

      amtool silence add 'alertname=~.+' -d TIME-TO-SILENCE -c 'COMMENT' --alertmanager.url http://localhost:10401
      

      Where:

      • TIME-TO-SILENCE is the amount of time in minutes or hours you want to silence alerts. For example, 30m or 4h.
      • COMMENT is any notes about this silence you want to add.

        Note:~.+ is a regular expression that includes all alerts in the silence you set.

  5. Record the ID string from the output. You can use this ID to end the silence early. For more information, run amtool --help or see the Alertmanager documentation in the Prometheus repository on GitHub.