Configuring TKGI Cluster Discovery

Page last updated:

This topic describes how to configure Tanzu Kubernetes Grid Integrated Edition (TKGI) cluster discovery in Healthwatch.

Overview of TKGI Cluster Discovery

In the TKGI Cluster Discovery pane of the Healthwatch tile, you configure the Prometheus instance in the Healthwatch tile to detect on-demand Kubernetes clusters created through the TKGI API and create scrape jobs for them. You only need to configure this pane if you have Ops Manager foundations with TKGI installed.

The Prometheus instance detects and scrapes TKGI clusters by connecting to the Kubernetes API through the TKGI API using a UAA client. To allow this, you must configure the Healthwatch tile, the Prometheus instance in the Healthwatch tile, the UAA client that the Prometheus instance uses to connect to the TKGI API, and the TKGI tile.

To configure TKGI cluster discovery:

  1. Configure the TKGI Cluster Discovery pane in the Healthwatch tile. For more information, see Configure TKGI Cluster Discovery in Healthwatch below.

  2. Configure TKGI to allow the Prometheus instance to scrape metrics from TKGI clusters. For more information, see Configure TKGI below.

If TKGI cluster discovery fails after you have completed both parts of the procedure in this topic, see Troubleshooting TKGI Cluster Discovery Failure below.

Note: To collect additional BOSH system metrics related to TKGI and view them in the Grafana UI, you must install and configure the Healthwatch Exporter for TKGI on your foundations with TKGI installed. To install the Healthwatch Exporter for TKGI tile, see Installing a Tile Manually. To configure the Healthwatch Exporter for TKGI tile, see Configuring Healthwatch Exporter for TKGI.

Configure TKGI Cluster Discovery in Healthwatch

In the TKGI Cluster Discovery pane of the Healthwatch tile, you configure TKGI cluster discovery, including the UAA client that the Prometheus instance uses to connect to the Kubernetes API through the TKGI API.

To configure the TKGI Cluster Discovery pane:

  1. Navigate to the Ops Manager Installation Dashboard.

  2. Click the Healthwatch tile.

  3. Select TKGI Cluster Discovery.

  4. Under TKGI cluster discovery, select one of the following options:

    • Configure: This option allows TKGI cluster discovery and reveals the configuration fields described in the steps below.
    • Do not configure: This option disallows TKGI cluster discovery. TKGI cluster discovery is disallowed by default.
  5. For Scrape port, enter a port where the Healthwatch tile exposes an endpoint from which the Prometheus instance scrapes metrics detailing the health of the TKGI cluster discovery process. These metrics appear in the Healthwatch - Exporter Troubleshooting dashboard in the Grafana UI.

  6. For TKGI API domain, enter the TKGI API domain you configured in the API Hostname (FQDN) field in the TKGI API pane of the TKGI tile. For example, api.tkgi.example.com. For more information, see Routing to the TKGI API VM in TKGI API Authentication in the TKGI documentation.

  7. For UAA client username, enter one of the following usernames for your TKGI UAA client:

  8. For TKGI UAA Client Secret, enter one of the following secrets for your TKGI UAA client:

    • Enter the TKGI management admin client secret:
      1. Return to the Ops Manager Installation Dashboard.
      2. Click the Tanzu Kubernetes Grid Integrated Edition tile.
      3. Select the Credentials tab.
      4. In the Pks Uaa Management Admin Client row, click Link to Credential.
      5. Record the value of secret.
      6. Return to the Ops Manager Installation Dashboard.
      7. Click the Healthwatch tile.
      8. Select TKGI Cluster Discovery.
      9. For UAA client secret, enter the secret you recorded from the Pks Uaa Management Admin Client row in the Credentials tab in the Tanzu Kubernetes Grid Integrated Edition tile in a previous step.
    • Create a separate UAA client with access to the TKGI API and enter the secret you specify for that UAA client. For more information, see Grant Tanzu Kubernetes Grid Integrated Edition Access to a Client in Managing Tanzu Kubernetes Grid Integrated Edition Users with UAA in the TKGI documentation.
  9. If you configured UAA as the OIDC provider for TKGI in the UAA pane of the TKGI tile, enter the TKGI UAA admin password in UAA admin password. Otherwise, do not configure this field. For more information, see Grant Tanzu Kubernetes Grid Integrated Edition Access to a Client in Managing Tanzu Kubernetes Grid Integrated Edition Users with UAA in the TKGI documentation.

  10. For SLI test frequency, enter in seconds how frequently you want the TKGI service level indicator (SLI) test to run. The TKGI SLI test monitors the health of the TKGI API by logging into the TKGI API server, listing all TKGI clusters, and logging out of the TKGI API server.

  11. (Optional) To allow the Prometheus instance to communicate with the TKGI API over TLS, configure one of the following options:

    • To configure the Prometheus instance to use a self-signed CA certificate or a certificate that is signed by a self-signed CA certificate when communicating with the TKGI API over TLS, provide the certificate for the CA in CA certificate for TLS. If you provide a self-signed CA certificate, it must be for the same CA that signs the certificate in the TKGI API.
    • If you do not provide a self-signed CA certificate or a certificate that is signed by a self-signed CA certificate, you can activate the Skip TLS certificate verification checkbox. When this checkbox is activated, the Prometheus instance does not verify the identity of the TKGI API. This checkbox is deactivated by default. VMware does not recommend skipping TLS certificate verification in a production environment.
  12. Click Save.

Configure TKGI

After you configure TKGI cluster discovery in the Healthwatch tile, you must configure TKGI to allow the Prometheus instance to scrape metrics from TKGI clusters.

To configure TKGI:

  1. Return to the Ops Manager Installation Dashboard.

  2. Click the Tanzu Kubernetes Grid Integrated Edition tile.

  3. Select Host Monitoring.

  4. Under Enable Telegraf Outputs?, select Yes.

  5. Activate the Include etcd metrics checkbox to allow TKGI to send etcd server and debugging metrics to Healthwatch.

  6. For Setup Telegraf Outputs, provide the following TOML configuration file:

    [[outputs.prometheus_client]]
          listen = ":10200"
    

    You must use 10200 as the listening port to allow the Prometheus instance to scrape Telegraf metrics from your TKGI clusters. For more information about creating a configuration file in TKGI, see Create a Configuration File in Configuring Telegraf in TKGI in the TKGI documentation.

  7. Click Save.

  8. For each plan you want to monitor:

    1. Select the plan you want to monitor. For example, Plan 2.
    2. For (Optional) Add-ons - Use with caution, enter the following YAML snippet to create the roles required to allow the Prometheus instance to scrape metrics from your TKGI clusters:

    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: healthwatch
    rules:
    - resources:
        - pods/proxy
        - pods
        - nodes
        - nodes/proxy
        - namespace/pods
        - endpoints
        - services
      verbs:
        - get
        - watch
        - list
      apiGroups:
        - ""
    - nonResourceURLs: ["/metrics"]
      verbs: ["get"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: healthwatch
    roleRef:
      apiGroup: ""
      kind: ClusterRole
      name: healthwatch
    subjects:
    - apiGroup: ""
      kind: User
      name: healthwatch
    

    If (Optional) Add-ons - Use with caution already contains other API resource definitions, append the above YAML snippet to the end of the existing resource definitions, followed by a newline character.

  9. Click Save.

  10. Select Errands.

  11. Ensure that the Upgrade all clusters errand is running. Running this errand configures your TKGI clusters with the roles you created in the (Optional) Add-ons - Use with caution field of the plans you monitor in a previous step.

  12. Click Save.

Troubleshooting TKGI Cluster Discovery Failure

TKGI cluster discovery can fail if the Prometheus instance fails to scrape metrics from your TKGI clusters. To troubleshoot TKGI cluster discovery failure, see Troubleshooting Missing TKGI Cluster Metrics in Troubleshooting Healthwatch.