LATEST VERSION: 1.9 - CHANGELOG
Pivotal Cloud Foundry v1.9

Troubleshooting Windows Cells

This topic describes how to troubleshoot Windows cells on a Pivotal Cloud Foundry (PCF) deployment.

To perform the troubleshooting procedures in this topic, you must first log in to BOSH and set your current deployment to Garden Windows by performing the following steps:

  1. Follow the steps in the Log into BOSH section of the Advanced Troubleshooting with the BOSH CLI topic to target and log in to your BOSH Director. The steps vary slightly depending on whether your PCF deployment uses internal authentication or an external user store.

  2. If necessary, download the manifest of your Garden Windows deployment:

    $ bosh download manifest garden-windows garden-windows.yml
    

  3. Set your deployment to garden-windows:

    $ bosh deployment garden-windows.yml
    

  4. Continue to retrieving logs or connecting to your Windows cell.

Retrieve Logs

Perform the following steps to retrieve the logs for the Windows cell:

  1. Download the logs, replacing YOUR-LOGS-DIR with your destination directory:

    $ bosh logs cell_windows 0 --dir YOUR-LOGS-DIR
    

  2. Your logs appear as a tarball in the destination directory you specified. Change into the directory and unzip the tarball:

    $ tar xvf cell_windows.0.2016-10-04-15-52-37.tgz
    

  3. Examine the logs. Each component on the cell has its own logs directory:

    • /consul_agent_windows/
    • /garden-windows/
    • /metron_agent_windows/
    • /rep_windows/

Connect to the Windows Cell

Perform the following steps to connect to your Windows cell to run diagnostics:

  1. Download and install a Remote Desktop Protocol (RDP) client.

    • For Mac OS X, download the Microsoft Remote Desktop app from the Mac App Store.
    • For Windows, download the Microsoft Remote Desktop app from Microsoft.
    • For Linux/UNIX, download a RDP client like rdesktop.
  2. Retrieve the IP address of your Windows cell:

    $ bosh vms garden-windows
    Acting as user 'director' on deployment 'garden-windows' on 'p-bosh-1170e9b438cb29ff7c63'
    Director task 274
    Task 274 done
    +-------------------------------------------------------+---------+---------+---------+--------------+
    | VM                                                    | State   | AZ      | VM Type | IPs          |
    +-------------------------------------------------------+---------+---------+---------+--------------+
    | cell_windows/0 (03e221b3-3222-5e1e-eedd-b92221ff88e1) | running | default | xlarge  | 198.51.100.1 |
    +-------------------------------------------------------+---------+---------+---------+--------------+

    VMs total: 1

  3. Retrieve the Administrator password for your Windows cell by following the steps for your IaaS:

    • On vSphere, this is the value of WINDOWS_PASSWORD in the consumer-vars.yml file you used to build a stemcell in the Building a Windows Stemcell topic.
    • On Amazon Web Services (AWS), navigate to the AWS EC2 console. Right-click on your Windows cell and select Get Windows Password from the drop-down menu. Provide the local path to the ops_mgr.pem private key file you used when installing Ops Manager and click Decrypt password to obtain the Administrator password for your Windows cell.
  4. Open your RDP client. The examples below use the Microsoft Remote Desktop app.

  5. Click New and enter your connection information: Rdp connect

    • Connection name: Enter a name for this connection.
    • PC name: Enter the IP address of your Windows cell.
    • User name: Enter Administrator.
    • Password: Enter the password of your Windows cell that you obtained above.
  6. To mount a directory on your local machine as a drive in the Windows cell, perform the following steps:

    1. From the same Edit Remote Desktops window as above, click Redirection.
    2. Click the plus icon at the bottom left. Rdp redirection
    3. For Name, enter the name of the drive as it will appear in the Windows cell. For Path, enter the path of the local directory.
    4. Click OK.
  7. Close the Edit Remote Desktops window and double-click the newly added connection under My Desktops to open a RDP connection to the Windows cell.

  8. In the RDP session, you can use the following tools to diagnose problems with your Windows cell:

Hakim

Hakim is a diagnostic tool that reveals common configuration issues with Windows cells. Perform the following steps to use Hakim:

  1. The Hakim binary is included in the DiegoWindows zip file in the Pivotal Cloud Foundry Elastic Runtime product on Pivotal Network. You can place the Hakim binary on your Windows cell in one of two ways:

    • Download the DiegoWindows zip file, unzip it, and place hakim.exe into a local directory that you mount as a drive on the Windows cell by following the steps above.
    • In the RDP session, open Internet Explorer and log in to Pivotal Network to download the DiegoWindows zip file directly to your Windows cell.
  2. Open a PowerShell window and change into the directory that contains hakim.exe:

    PS C:\Users\Administrator> cd Downloads 
    

  3. Run hakim.exe:

    PS C:\Users\Administrator\Downloads> .\hakim.exe 
    

    Hakim only outputs to the PowerShell if it detects errors. Refer to the section below for a list of Hakim error messages and their possible solutions.

Hakim Error Messages

Processes

The following processes are not running

This usually indicates a failed deployment. Try redeploying the BOSH Release for Windows.


NTP

There was an error detecting ntp synchronization on your machine. An accurate system clock is essential for internal Cloud Foundry metric reports. Please configure your NTP settings, if not already done. We recommend that your firewall have outbound rules set for UDP on port 123. In addition, ensure that your 'DnsCache' service is running

If NTP is not configured, clock skew with other PCF components can occur. Clock skew can result in odd errors, such as not receiving any metrics from apps running on the affected machine. Ensure that you are using the same NTP server on your Windows cell as the rest of your PCF deployment.


Firewall

Windows firewall service is not enabled. The Windows firewall is required in order to enforce Application Security Group rules. Running without the firewall is possible, but strongly not recommended.

Garden Windows enforces PCF security group settings for apps running on the Windows cell through the Windows firewall. Apps can run without this, but security groups do not work correctly and apps have unrestricted network access.

To resolve this error, enable the Windows firewall. Perform the following steps in your RDP session to access the Windows firewall configuration:

  1. Open the Server Manager from the task bar.
  2. Click Tools in the upper right and select Windows Firewall with Advanced Security.
  3. Configure and enable the Windows firewall.


Fair Share

Fair Share CPU Scheduling must be disabled

You must disable Fair Share CPU scheduling for your Windows cell to function properly. Perform the following steps in your RDP session:

  1. Open the Registry Editor at C:\Windows\regedit.exe.
  2. Navigate to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Quota System\.
  3. Double-click the EnableCpuQuota value.
  4. Change the Value data from 1 to 0.
  5. Click OK.


Container

Failed to create container

This usually indicates an issue with the Windows containerization service. Contact Pivotal Support and provide the full output of this error.


Consul

Failed to resolve consul host

This usually indicates interference with DNS resolution on your Windows cell. To resolve this error, perform the following steps in your RDP session to set 127.0.0.1 as the primary DNS server for the active network adapter:

  1. Open the Control Panel.
  2. Click Network and Internet
  3. Click Network and Sharing Center.
  4. Click Change adapter settings on the left.
  5. Double-click your active network adapter.
  6. Click Properties.
  7. Select Internet Protocol Version 4 (TCP/IPv4).
  8. Click Properties.
  9. Ensure that Use the following DNS server addresses is selected and enter 127.0.0.1 for Preferred DNS server.
  10. Click OK.


Consul CLI

Perform the following steps to use the Consul CLI on your Windows cell to diagnose problems with your Consul cluster:

  1. In your RDP session, open a PowerShell window.
  2. Change into the directory that contains the Consul CLI binary:
    PS C:\Users\Administrator> cd C:\var\vcap\packages\consul-windows\bin\ 
    
  3. Use the Consul CLI to list the members of your Consul cluster:
    PS C:\Users\Administrator\var\vcap\packages\consul-windows\bin> .\consul.exe members
    Node                       Address          Status  Type    Build  Protocol  DC
    cell-windows-0             10.0.0.111:8301  alive   client  0.6.4  2         dc1
    cloud-controller-0         10.0.0.94:8301   alive   client  0.6.4  2         dc1
    cloud-controller-worker-0  10.0.0.99:8301   alive   client  0.6.4  2         dc1
    consul-server-0            10.0.0.96:8301   alive   server  0.6.4  2         dc1
    diego-brain-0              10.0.0.109:8301  alive   client  0.6.4  2         dc1
    diego-cell-0               10.0.0.103:8301  alive   client  0.6.4  2         dc1
    diego-cell-1               10.0.0.104:8301  alive   client  0.6.4  2         dc1
    diego-cell-2               10.0.0.107:8301  alive   client  0.6.4  2         dc1
    diego-database-0           10.0.0.92:8301   alive   client  0.6.4  2         dc1
    ha-proxy-0                 10.0.0.254:8301  alive   client  0.6.4  2         dc1
    nfs-server-0               10.0.0.100:8301  alive   client  0.6.4  2         dc1
    router-0                   10.0.0.105:8301  alive   client  0.6.4  2         dc1
    uaa-0                      10.0.0.93:8301   alive   client  0.6.4  2         dc1
    
  4. Examine the output to ensure that the cell-windows-0 service is registered in the Consul cluster and is alive. Otherwise, your Windows cell cannot communicate with your PCF deployment and developers cannot push .NET apps to the Windows cell. Check the configuration of your Consul cluster, and ensure that your certificates are not missing or misconfigured.
Was this helpful?
What can we do to improve?
View the source for this page in GitHub