Troubleshooting Windows Diego Cells
This topic describes how to troubleshoot Windows Diego Cells deployed by Pivotal Application Service for Windows (PASW).
Installation Issues
This section describes issues that may occur during the installation process.
Missing Local Certificates for Windows File System Injector
Symptom
You run the winfs-injector
and see the following error about certificates:
Get https://auth.docker.io/token?service=registry.docker.io&
scope=repository:cloudfoundry/windows2016fs:pull: x509:
failed to load system roots and no roots provided
Explanation
Local certificates are needed to communicate with Docker Hub.
Solution
Install the necessary certificates on your local machine. On Ubuntu, you can install certificates with the ca-certificates
package.
Outdated Version for Windows File System Injector
Symptom
You run the winfs-injector
and see the following error about a missing file or directory:
open ...windows2016fs-release/VERSION: no such file or directory
Explanation
You are using an outdated version of the winfs-injector
.
Solution
From the Pivotal Application Service for Windows page on Pivotal Network, download the recommended version of File System Injector tool for the tile.
Missing Container Image
Symptom
You click the + icon in Ops Manager to add the PASW tile to the Installation Dashboard, and you see the error:
Explanation
The product file you are trying to upload does not contain the Windows Server container base image.
Solution
Delete the product file listing from Ops Manager by clicking its trash can icon under Import a Product.
Follow the PASW installation instructions to run the
winfs-injector
tool locally on the product file. This step requires Internet access, can take up to 20 minutes, and adds the Windows Server container base image to the product file. For more information, see Install in Installing and Configuring PASW.Click Import a Product to upload the “injected” product file.
Click the + icon next to the product listing to add the PASW tile to the Installation Dashboard.
Upgrade Issues
This section describes issues that may occur during the upgrade process.
Failure to Create Containers When Upgrading with Shared Microsoft Base Image
Symptom
The pre-start script for the windowsfs
job fails, and the upgrade fails with the following output:
Task 308031 | 13:47:04 | Preparing deployment: Preparing deployment (00:00:03)
Task 308031 | 13:47:11 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 308031 | 13:47:21 | Updating instance windows_diego_cell: windows_diego_cell/44c5841f-7580-4e9c-9856-89fcbe08ab0d (2) (canary) (00:00:35)
L Error: Action Failed get_task: Task 59ba76d1-14c5-4d7b-681c-08b9ec4bd64d result: 1 of 10 pre-start scripts failed. Failed Jobs: windows1803fs. Successful Jobs: set_kms_host, groot, loggregator_agent_windows, bosh-dns-windows, rep_windows, winc-network-1803, set_password, enable_ssh, enable_rdp.
Task 308031 | 13:47:56 | Error: Action Failed get_task: Task 59ba76d1-14c5-4d7b-681c-08b9ec4bd64d result: 1 of 10 pre-start scripts failed. Failed Jobs: windows1803fs. Successful Jobs: set_kms_host, groot, loggregator_agent_windows, bosh-dns-windows, rep_windows, winc-network-1803, set_password, enable_ssh, enable_rdp.
Otherwise, the post-start script for the rep_windows
job fails, and the upgrade fails with the following output:
Task 8192 | 21:12:30 | Updating instance windows2019-cell: windows2019-cell/bd6d70b9-ed1f-412f-9d49-8045627f4ab3 (0) (canary) (00:17:24)
L Error: Action Failed get_task: Task a9555020-1a3b-40c7-677c-d6fc392ce135 result: 1 of 3 post-start scripts failed. Failed Jobs: rep_windows. Successful Jobs: route_emitter_windows, bosh-dns-windows.
Task 8192 | 21:29:55 | Error: Action Failed get_task: Task a9555020-1a3b-40c7-677c-d6fc392ce135 result: 1 of 3 post-start scripts failed. Failed Jobs: rep_windows. Successful Jobs: route_emitter_windows, bosh-dns-windows.
Explanation
When upgrading between versions of Windows rootfs that have a shared Microsoft base layer, PASW may fail to create containers. The cause is currently unknown.
Solution
For available workarounds, see Failure to create containers when upgrading with shared Microsoft base image in the Pivotal Knowledge Base.
Forward Windows Diego Cell Logs
You can use Windows cell logs to troubleshoot Windows Diego Cells. Windows cells generate the following types of logs:
BOSH job logs, such as
rep_windows
andconsul_agent_windows
. These logs stream to the syslog server configured in the System Logging pane of the PAS tile, along with other Pivotal Platform component logs. The names of these BOSH job logs correspond to the names of the logs emitted by Linux Diego Cells.Windows event logs. These logs stream to the syslog server configured in the System Logging pane of the PASW tile.
You can forward BOSH job logs and Windows Event logs to an external syslog server in the following ways:
Configure a BOSH add-on to forward BOSH job logs. For more information, see the BOSH jobs logs step in Step 2: Install the Tile in Installing and Configuring PASW.
Configure PASW to forward Windows event logs. For more information, see Forward Windows Event Logs to a Syslog Server.
You can download the forwarded BOSH job logs and Window event logs in the PASW tile. For more information, see Download Diego Cell Logs.
Forward Windows Event Logs to a Syslog Server
To forward Windows event logs to an external syslog server:
Navigate to the Ops Manager Installation Dashboard.
Click the PASW tile.
Select System Logging.
Under Enable syslog for VM logs?, select Enable.
Under Address, enter the hostname or IP address of your syslog server.
Under Port, enter the port of your syslog server. The default port is 514.
Note: The host must be reachable from the PAS network. Ensure your syslog server listens on external interfaces.
Under Protocol, select the transport protocol to use when forwarding logs.
Enable the Enable system metrics checkbox. For a list of the VM metrics that the System Metric Agent emits, see VM Metrics on GitHub.
Click Save.
Download Windows Cell Logs
To download Windows cell logs:
Navigate to the Ops Manager Installation Dashboard.
Click the PASW tile.
Click the Status tab.
Under the Logs column, click the download icon for the Windows cell you want to retrieve logs from.
Click the Logs tab.
When the logs are ready, click the filename to download them.
Unzip the file to examine the contents. Each component on the Diego Cell has its own logs directory:
/consul_agent_windows/
/garden-windows/
/metron_agent_windows/
/rep_windows/
Troubleshoot Windows Compilation VMs
BOSH automatically deletes a compilation VM after the compilation VM fails. In a vSphere environment, use one of the procedures below to troubleshoot your Windows stemcell v2019.7 and later compilation VM issues:
- Troubleshoot a Slowly-Deleted Windows Compilation VM
- Troubleshoot a Quickly-Deleted Windows Compilation VM
Troubleshoot a Slowly-Deleted Windows Compilation VM
The easiest method to troubleshoot a Windows compilation VM is to
bosh ssh
to the VM before BOSH deletes it.
To troubleshoot a compilation VM from an ssh
session:
Open the vSphere UI.
Open two different BOSH CLI terminal sessions.
Open Ops Manager.
Enable the two following settings in Ops Manager:
- Select Keep Unreachable Director VMs from BOSH Director tile > Director config.
- Select enable BOSH-native SSH support on all VMs from PASW tile > VM options.
Click Apply Changes against the PASW tile.
From the first BOSH CLI terminal, monitor the BOSH task:
watch -n 5 "bosh -d TAS-WINDOWS-DEPLOYMENT is --details | grep compilation"
Where
TAS-WINDOWS-DEPLOYMENT
is the name of your PASW deployment.Wait until the compilation VM CID is up.
From the second BOSH CLI terminal, SSH to the Windows compilation VM:
bosh -d TAS-WINDOWS-DEPLOYMENT ssh COMPILATION-NAME
Where:
TAS-WINDOWS-DEPLOYMENT
is the name of your PASW deployment.COMPILATION-NAME
is the name of your Windows compilation VM.
To prevent BOSH from deleting the compilation VM after the compilation VM fails, search for the compilation VM CID in the vSphere UI and rename it.
You can now troubleshoot within this session.
After troubleshooting, delete the VM manually.
Troubleshoot a Quickly-Deleted Windows Compilation VM
In some situations, the Windows compilation VM might be deleted very quickly,
making it impossible to bosh ssh
to the VM before BOSH deletes it.
To troubleshoot a quickly-deleted compilation VM:
Download an Ubuntu desktop image from Ubuntu Releases Xenial.
Upload the Ubuntu desktop image into your vSphere datastore.
Open the vSphere UI.
Open a BOSH CLI terminal session.
Open Ops Manager.
Enable the two following settings in Ops Manager:
- Select Keep Unreachable Director VMs from BOSH Director tile > Director config.
- Select enable BOSH-native SSH support on all VMs from PASW tile > VM options.
Click Apply Changes in Ops Manager.
From the BOSH CLI terminal, monitor the BOSH task:
watch -n 5 "bosh -d TAS-WINDOWS-DEPLOYMENT is --details | grep compilation"
Where
TAS-WINDOWS-DEPLOYMENT
is the name of your PASW deployment.Wait until the compilation VM CID is up.
From the vSphere UI:
- Locate the compilation VM CID in the vSphere UI.
- To prevent BOSH from deleting the compilation VM after the compilation VM fails, rename the compilation VM.
- On the Windows compilation VM, go to Edit settings > add a device CD/DVD drive > browse Datastore ISO file, and select the Ubuntu desktop iso -> select Connect at Power ON.
- Go to Edit settings -> VM options tab -> Boot Options.
- Increase the Boot Delay to
10000 milliseconds
. - Select Force BIOS Setup.
- Select Start/Restart to restart the VM.
- Locate the compilation VM CID in the vSphere UI.
On the BIOS setup screen, boot with the CD-ROM Drive.
After Ubuntu desktop starts, select try Ubuntu and launch a terminal.
In the terminal, run:
sudo fdisk -l sudo mkdir /mnt/windows sudo mount /dev/sda1 /mnt/windows
You can now troubleshoot within this session by exploring the contents of the windows VM’s file system within
/mnt/windows
After troubleshooting, delete the VM manually.