Troubleshooting Windows Diego Cells

This topic describes how to troubleshoot Windows Diego Cells deployed by VMware Tanzu Application Service for VMs [Windows] (TAS for VMs [Windows]).

Installation Issues

This section describes issues that may occur during the installation process.

Missing Local Certificates for Windows File System Injector

Symptom

You run the winfs-injector and see the following error about certificates:

Get https://auth.docker.io/token?service=registry.docker.io&
scope=repository:cloudfoundry/windows2016fs:pull: x509:
failed to load system roots and no roots provided

Explanation

Local certificates are needed to communicate with Docker Hub.

Solution

Install the necessary certificates on your local machine. On Ubuntu, you can install certificates with the ca-certificates package.

Outdated Version for Windows File System Injector

Symptom

You run the winfs-injector and see the following error about a missing file or directory:

open ...windows2016fs-release/VERSION: no such file or directory

Explanation

You are using an outdated version of the winfs-injector.

Solution

From the VMware Tanzu Application Service for VMs [Windows] page on VMware Tanzu Network, download the recommended version of File System Injector tool for the tile.

Missing Container Image

Symptom

You click the + icon in Ops Manager to add the TAS for VMs [Windows] tile to the Installation Dashboard and see the following error:

Error invalid release uninjected tile

Explanation

The product file that you are trying to upload does not contain the Windows Server container base image.

Solution

  1. Delete the product file listing from Ops Manager by clicking its trash can icon under Import a Product.

  2. Follow the TAS for VMs [Windows] installation instructions to run the winfs-injector tool locally on the product file. This step adds the Windows Server container base image to the product file, requires internet access, and can take up to 20 minutes. For more information, see Install the Tile in Installing and Configuring TAS for VMs [Windows].

  3. Click Import a Product to upload the injected product file.

  4. Click the + icon next to the product listing to add the TAS for VMs [Windows] tile to the Installation Dashboard.

Upgrade Issues

This section describes issues that may occur during the upgrade process.

Failure to Create Containers When Upgrading with Shared Microsoft Base Image

Symptom

The pre-start script for the windowsfs job fails, and the upgrade fails with the following output:

Task 308031 | 13:47:04 | Preparing deployment: Preparing deployment (00:00:03)

Task 308031 | 13:47:11 | Preparing package compilation: Finding packages to compile (00:00:00)

Task 308031 | 13:47:21 | Updating instance windows_diego_cell: windows_diego_cell/44c5841f-7580-4e9c-9856-89fcbe08ab0d (2) (canary) (00:00:35)

L Error: Action Failed get_task: Task 59ba76d1-14c5-4d7b-681c-08b9ec4bd64d result: 1 of 10 pre-start scripts failed. Failed Jobs: windows1803fs. Successful Jobs: set_kms_host, groot, loggregator_agent_windows, bosh-dns-windows, rep_windows, winc-network-1803, set_password, enable_ssh, enable_rdp.

Task 308031 | 13:47:56 | Error: Action Failed get_task: Task 59ba76d1-14c5-4d7b-681c-08b9ec4bd64d result: 1 of 10 pre-start scripts failed. Failed Jobs: windows1803fs. Successful Jobs: set_kms_host, groot, loggregator_agent_windows, bosh-dns-windows, rep_windows, winc-network-1803, set_password, enable_ssh, enable_rdp.

Otherwise, the post-start script for the rep_windows job fails, and the upgrade fails with the following output:

Task 8192 | 21:12:30 | Updating instance windows2019-cell: windows2019-cell/bd6d70b9-ed1f-412f-9d49-8045627f4ab3 (0) (canary) (00:17:24)
                     L Error: Action Failed get_task: Task a9555020-1a3b-40c7-677c-d6fc392ce135 result: 1 of 3 post-start scripts failed. Failed Jobs: rep_windows. Successful Jobs: route_emitter_windows, bosh-dns-windows.
Task 8192 | 21:29:55 | Error: Action Failed get_task: Task a9555020-1a3b-40c7-677c-d6fc392ce135 result: 1 of 3 post-start scripts failed. Failed Jobs: rep_windows. Successful Jobs: route_emitter_windows, bosh-dns-windows.

Explanation

When upgrading between versions of Windows rootfs that have a shared Microsoft base layer, TAS for VMs [Windows] may fail to create containers.

Solution

For available workarounds, see Failure to create containers when upgrading with shared Microsoft base image in the Knowledge Base.

Forward Windows Diego Cell Logs

You can use Windows Diego Cell logs to troubleshoot Windows Diego Cells. Windows Diego Cells generate the following types of logs:

  • BOSH job logs, such as rep_windows and consul_agent_windows. These logs stream to the syslog server configured in the System Logging pane of the TAS for VMs [Windows] tile, along with other Ops Manager component logs. The names of these BOSH job logs correspond to the names of the logs emitted by Linux Diego Cells.

  • Windows event logs. These logs stream to the syslog server configured in the System Logging pane of the TAS for VMs [Windows] tile.

You can forward BOSH job logs and Windows Event logs to an external syslog server in the following ways:

  • Configure a BOSH add-on to forward BOSH job logs. For more information, see the BOSH jobs logs step in Install the Tile in Installing and Configuring TAS for VMs [Windows].

  • Configure TAS for VMs [Windows] to forward Windows event logs. For more information, see Forward Windows Event Logs to a Syslog Server.

You can download the forwarded BOSH job logs and Window event logs in the TAS for VMs [Windows] tile. For more information, see Download Diego Cell Logs.

Forward Windows Event Logs to a Syslog Server

To forward Windows event logs to an external syslog server:

  1. Navigate to the Ops Manager Installation Dashboard.

  2. Click the TAS for VMs [Windows] tile.

  3. Select System Logging.

    Win syslog config

  4. Under Enable syslog for VM logs?, select Enable.

  5. Under Address, enter the hostname or IP address of your syslog server.

  6. Under Port, enter the port of your syslog server. The default port is 514.

    Note: The host must be reachable from the TAS for VMs network. Ensure that your syslog server listens on external interfaces.

  7. Under Protocol, select the transport protocol to use when forwarding logs.

  8. Enable the Enable system metrics checkbox. For a list of the VM metrics that the System Metric Agent emits, see VM Metrics in the System Metrics repository on GitHub.

  9. Click Save.

Download Windows Diego Cell Logs

To download Windows Diego Cell logs:

  1. Navigate to the Ops Manager Installation Dashboard.

  2. Click the TAS for VMs [Windows] tile.

  3. Click the Status tab.

  4. Under the Logs column, click the download icon for the Windows Diego Cell for which you want to retrieve logs.

  5. Click the Logs tab.

  6. When the logs are ready, click the filename to download them.

  7. Unzip the file to examine the contents. Each component on the Diego Cell has its own logs directory:

    • /consul_agent_windows/
    • /garden-windows/
    • /metron_agent_windows/
    • /rep_windows/

Connect to a Windows Diego Cell

To connect to a Windows Diego Cell to run diagnostics:

  1. Download and install a Remote Desktop Protocol (RDP) client.

  2. To log in to your BOSH Director, follow the procedure in Authenticate with the BOSH Director VM in Advanced Troubleshooting with the BOSH CLI. The steps vary depending on whether your Ops Manager deployment uses internal authentication or an external user store. In TAS for VMs [Windows] v2.6 and later, use of the bosh ssh command is enabled by default.

  3. Retrieve the IP address of your Windows Diego Cell using the BOSH CLI by running:

    bosh -e ENV-NAME -d DEPLOYMENT-NAME
    

    Where:

    • ENV-NAME is the alias that you assigned to your BOSH Director.
    • DEPLOYMENT-NAME is the name of your deployment.

    For example:

    c:\Users\admin> bosh -e my-environ -d garden-windows
    Using environment '192.0.2.6' as client 'admin'

    Name Release(s) Stemcell(s) Team(s) Cloud Config garden-windows ... ... - latest

  4. Retrieve the administrator password for your Windows Diego Cell by following the steps for your IaaS:

    • vSphere: Retrieve the value of WINDOWS_PASSWORD in the consumer-vars.yml file that you used to build a stemcell previously.
    • Amazon Web Services (AWS): Navigate to the AWS EC2 console. Right-click on your Windows Diego Cell and select Get Windows Password from the dropdown. Provide the local path to the ops_mgr.pem private key file that you used when installing Ops Manager and click Decrypt password to obtain the administrator password for your Windows Diego Cell.
    • Google Cloud Platform (GCP): Navigate to the Compute Engine Dashboard. Under VM Instances, select the instance of the Windows VM. At the top of the page, click on Create or reset Windows password. When prompted, enter Administrator under Username and click Set. You GCP provides a one-time password for the Windows Diego Cell.
    • Azure: You cannot RDP into Windows Diego Cells on Azure.
  5. Open your RDP client. The examples below use the Microsoft Remote Desktop app.

  6. Click New and enter your connection information: Rdp connect

    • Connection name: Enter a name for this connection.
    • PC name: Enter the IP address of your Windows Diego Cell.
    • User name: Enter Administrator.
    • Password: Enter the password for your Windows Diego Cell that you obtained above.
  7. Mount a directory on your local machine as a drive in the Windows Diego Cell:

    1. From the same Edit Remote Desktops window as above, click Redirection.
    2. Click the plus icon at the bottom left. Rdp redirection
    3. For Name, enter the name of the drive as it will appear in the Windows Diego Cell. For Path, enter the path of the local directory.
    4. Click OK.
  8. Close the Edit Remote Desktops window and double-click the newly added connection under My Desktops to open a RDP connection to the Windows Diego Cell.

  9. In the RDP session, you can use the Consul CLI to diagnose problems with your Windows Diego Cell. For more information, see Consul CLI.

Consul CLI

To use the Consul CLI on your Windows Diego Cell to diagnose problems with your Consul cluster:

  1. In your RDP session, open a PowerShell window.

  2. Navigate to the directory that contains the Consul CLI binary by running:

    cd CONSUL-CLI-DIR\bin
    

    Where CONSUL-CLI-DIR is the Consul CLI package’s directory path.

    For example:

    PS C:\Users\admin> cd C:\var\vcap\packages\consul-windows\bin\
    

  3. List the members of your Consul cluster by running:

    consul.exe members
    

    For example:

    PS C:\var\vcap\packages\consul-windows\bin> consul.exe members
    Node                       Address          Status  Type    Build  Protocol  DC
    cell-windows-0             10.0.0.111:8301  alive   client  0.6.4  2         dc1
    cloud-controller-0         10.0.0.94:8301   alive   client  0.6.4  2         dc1
    cloud-controller-worker-0  10.0.0.99:8301   alive   client  0.6.4  2         dc1
    consul-server-0            10.0.0.96:8301   alive   server  0.6.4  2         dc1
    diego-brain-0              10.0.0.109:8301  alive   client  0.6.4  2         dc1
    diego-cell-0               10.0.0.103:8301  alive   client  0.6.4  2         dc1
    diego-cell-1               10.0.0.104:8301  alive   client  0.6.4  2         dc1
    diego-cell-2               10.0.0.107:8301  alive   client  0.6.4  2         dc1
    diego-database-0           10.0.0.92:8301   alive   client  0.6.4  2         dc1
    ha-proxy-0                 10.0.0.254:8301  alive   client  0.6.4  2         dc1
    nfs-server-0               10.0.0.100:8301  alive   client  0.6.4  2         dc1
    router-0                   10.0.0.105:8301  alive   client  0.6.4  2         dc1
    uaa-0                      10.0.0.93:8301   alive   client  0.6.4  2         dc1
    

  4. Examine the output to ensure that the cell-windows-0 service is registered in the Consul cluster and has a staus of alive. Otherwise, your Windows Diego Cell cannot communicate with your Ops Manager deployment and developers cannot push .NET apps to the Windows Diego Cell. Verify the configuration of your Consul cluster and ensure that your certificates are not missing or misconfigured.