Advanced Troubleshooting with the BOSH CLI

Page last updated:

To perform advanced troubleshooting, you must log into the BOSH Director. From there, you can run specific commands using the BOSH Command Line Interface (CLI). BOSH Director diagnostic commands have access to information about your entire Pivotal Cloud Foundry (PCF) installation.

The BOSH Director runs on the virtual machine (VM) that Ops Manager deploys on the first install of the Ops Manager Director tile.

BOSH Director diagnostic commands have access to information about your entire Pivotal Cloud Foundry (PCF) installation.

Note: For more troubleshooting information, refer to the Troubleshooting Guide.

Note: Verify that no BOSH Director tasks are running on the Ops Manager VM before running any commands. You should not proceed with troubleshooting until all BOSH Director tasks have completed or you have ended them. See the Bosh CLI Commands for more information.

Prepare to Use the BOSH CLI

This section guides you through preparing to use the BOSH CLI.

Gather Information

Before you begin troubleshooting with the BOSH CLI, collect the information you need from the Ops Manager interface.

  1. Open the Ops Manager interface by navigating to the Ops Manager fully qualified domain name (FQDN). Ensure that there are no installations or updates in progress.

  2. Click the Ops Manager Director tile and select the Status tab.

  3. Record the IP address for the Director job. This is the IP address of the VM where the BOSH Director runs.

    Ops mgr job ip

  4. Select the Credentials tab.

  5. Click Link to Credential to view and record the Director Credentials.

    Bosh creds

  6. Return to the Installation Dashboard.

  7. (Optional) To prepare to troubleshoot the job VM for any other product, click the product tile and repeat the procedure above to record the IP address and VM credentials for that job VM.

  8. Log out of Ops Manager.

Note: You must log out of the Ops Manager interface to use the BOSH CLI.

SSH into Ops Manager

Use SSH to connect to the Ops Manager web application VM.

To SSH into the Ops Manager VM:

vSphere:

You need the credentials used to import the PCF .ova or .ovf file into your virtualization system.

  1. From a command line, run ssh ubuntu@OPS-MANAGER-FQDN.

  2. When prompted, enter the password that you set during the .ova deployment into vCenter:

    $ ssh ubuntu@OPS-MANAGER-FQDN
    Password: ***********
    

AWS, Azure, and OpenStack:

  1. Locate the Ops Manager FQDN on the AWS EC2 instances page or the OpenStack Access & Security page.

  2. Change the permissions on the .pem file to be more restrictive:

    $ chmod 600 ops_mgr.pem
    
  3. Run the ssh command:

    ssh -i ops_mgr.pem ubuntu@OPS-MANAGER-FQDN
    

Log into BOSH

Log into the BOSH Director using one of the following options below:

Internal UAAC Login

  1. Target the BOSH UAA on Ops Manager with the UAAC command uaac target.

    $ uaac target --ca-cert /var/tempest/workspaces/default/root_ca_certificate https://DIRECTOR-IP-ADDRESS:8443
    
  2. Run bosh target DIRECTOR-IP-ADDRESS to target your Ops Manager VM using the BOSH CLI.

  3. Retrieve the UAA admin user password from the Ops Manager Director>Credentials tab. Alternatively, launch a browser and visit the following URL to obtain the password: https://{OPSMANAGER}/api/v0/deployed/director/credentials/director_credentials

  4. Log in using the BOSH Director credentials:

    $ bosh --ca-cert /var/tempest/workspaces/default/root_ca_certificate target DIRECTOR-IP-ADDRESS
    Target set to 'DIRECTOR_UUID'
    Your username: director
    Enter password: (DIRECTOR_CREDENTIAL)
    Logged in as 'director'
    

External User Store Login via SAML

To log into BOSH Director you need browser access to the BOSH Director in order to get a UAA Passcode. If you have browser access, skip to step 1 below.

If you do not have browser access to the BOSH Director, consider running sshuttle on your local workstation (Linux only). This permits you to browse the BOSH Director IP as if it were a local address.

$ git clone https://github.com/apenwarr/sshuttle.git
$ cd sshuttle
$ ./sshuttle -r username@opsmanagerIP 0.0.0.0/0 -vv
  1. Log in to your identity provider and use the information below to configure SAML Service Provider Properties:

    • Service Provider Entity ID: bosh-uaa
    • ACS URL : https://BOSH-DIRECTOR-IP:8443/saml/SSO/alias/bosh-uaa
    • Binding : HTTP Post
    • SLO URL: https://BOSH-DIRECTOR-IP:8443/saml/SSO/alias/bosh-uaa
    • Binding : HTTP Redirect
    • Name ID : Email Address
  2. Log into BOSH using your SAML credentials.

    $ bosh login
    Email: admin
    Password:
    One Time Code (Get one at https://192.0.2.16.11:8888/passcode):
    
  3. Click Login with organization credentials (SAML).

    Login saml credentials

  4. Copy the Temporary Authentication Code that appears in your browser.

    Saml login temp auth code

  5. You see a login confirmation. For example:

    Logged in as admin@example.org
    

Select a Product Deployment to Troubleshoot

When you import and install a product using Ops Manager, you deploy an instance of the product described by a YAML file. Examples of available products include Elastic Runtime, MySQL, or any other service that you imported and installed.

Perform the following steps to select a product deployment to troubleshoot:

  1. Identify the YAML file that describes the deployment you want to troubleshoot.

    You identify the YAML file that describes a deployment by its filename. For example, to identify Elastic Runtime deployments, run the following command:

    find /var/tempest/workspaces/default/deployments -name cf-*.yml

    The table below shows the naming conventions for deployment files.

    ProductDeployment Filename Convention
    Elastic Runtime cf-<20-character_random_string>.yml
    MySQL Dev cf_services-<20-character_random_string>.yml
    Other <20-character_random_string>.yml

    Note: Where there is more than one installation of the same product, record the release number shown on the product tile in Operations Manager. Then, from the YAML files for that product, find the deployment that specifies the same release version as the product tile.

  2. Run bosh status and record the UUID value.

  3. Open the DEPLOYMENT-FILENAME.yml file in a text editor and compare the director_uuids value in this file with the UUID value that you recorded. If the values do not match, perform the following steps:

    1. Replace the director_uuids value with the UUID value.
    2. Run bosh deployment DEPLOYMENT-FILENAME.yml to reset the file for your deployment.
  4. Run bosh deployment DEPLOYMENT-FILENAME.yml to instruct the BOSH Director to apply BOSH CLI commands against the deployment described by the YAML file that you identified:

    $ bosh deployment /var/tempest/workspaces/default/deployments/cf-cca1234abcd.yml
    

Use the BOSH CLI for Troubleshooting

This section describes three BOSH CLI commands commonly used during troubleshooting.

  • VMS: Lists all VMs in a deployment
  • Cloudcheck: Runs a cloud consistency check and interactive repair
  • SSH: Starts an interactive session or executes commands with a VM

BOSH VMS

bosh vms provides an overview of the virtual machines that BOSH manages as part of the current deployment.

$ bosh vms
Acting as user 'director' on 'p-bosh-e11111e1e023e2ee1e11'
RSA 1024 bit CA certificates are loaded due to old openssl compatibility
Deployment 'cf-33e333333eebbb3b33b3'
Director task 2002
Task 2002 done
+-------------------------------------------------------------------------------------------------------+---------+-----+--------------------------------------------------------------+------------+
| VM                                                                                                    | State   | AZ  | VM Type                                                      | IPs        |
+-------------------------------------------------------------------------------------------------------+---------+-----+--------------------------------------------------------------+------------+
| clock_global-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc111111111)                  | running | n/a | clock_global-partition-9965d7cc1758828b974f                  | 10.0.16.20 |
| cloud_controller-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc222222222)              | running | n/a | cloud_controller-partition-3333e3ee3332221e222e              | 10.0.16.19 |
| cloud_controller_worker-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc333333333)       | running | n/a | cloud_controller_worker-partition-3333e3ee3332221e222e       | 10.0.16.21 |
| consul_server-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc444444444)                 | running | n/a | consul_server-partition-3333e3ee3332221e222e                 | 10.0.16.11 |
| diego_brain-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc555555555)                   | running | n/a | diego_brain-partition-3333e3ee3332221e222e                   | 10.0.16.23 |
| diego_cell-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc666666666)                    | running | n/a | diego_cell-partition-3333e3ee3332221e222e                    | 10.0.16.24 |
| diego_cell-partition-3333e3ee3332221e222e/1 (abc31111-111e-1ec3-bb3e-ccc777777777)                    | running | n/a | diego_cell-partition-3333e3ee3332221e222e                    | 10.0.16.25 |
| diego_cell-partition-3333e3ee3332221e222e/2 (abc31111-111e-1ec3-bb3e-ccc888888888)                    | running | n/a | diego_cell-partition-3333e3ee3332221e222e                    | 10.0.16.26 |
| diego_database-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc999999999)                | running | n/a | diego_database-partition-3333e3ee3332221e222e                | 10.0.16.14 |
| doppler-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ddd111111111)                       | running | n/a | doppler-partition-3333e3ee3332221e222e                       | 10.0.16.27 |
| etcd_server-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-eee111111111)                   | running | n/a | etcd_server-partition-3333e3ee3332221e222e                   | 10.0.16.13 |
| loggregator_trafficcontroller-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-fff223111111) | running | n/a | loggregator_trafficcontroller-partition-3333e3ee3332221e222e | 10.0.16.28 |
| mysql-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ggg111111111)                         | running | n/a | mysql-partition-3333e3ee3332221e222e                         | 10.0.16.18 |
| mysql_proxy-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-hhh111111111)                   | running | n/a | mysql_proxy-partition-3333e3ee3332221e222e                   | 10.0.16.17 |
| nats-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-iii111111111)                          | running | n/a | nats-partition-3333e3ee3332221e222e                          | 10.0.16.12 |
| nfs_server-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-jjj111111111)                    | running | n/a | nfs_server-partition-3333e3ee3332221e222e                    | 10.0.16.15 |
| router-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-kkk111111111)                        | running | n/a | router-partition-3333e3ee3332221e222e                        | 10.0.16.16 |
| uaa-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-lll111111111)                           | running | n/a | uaa-partition-3333e3ee3332221e222e                           | 10.0.16.22 |
+-------------------------------------------------------------------------------------------------------+---------+-----+--------------------------------------------------------------+------------+
VMs total: 18

When troubleshooting an issue with your deployment, bosh vms may show a VM in an unknown state. Run bosh cloudcheck on a VM in an unknown state to instruct BOSH to diagnose problems with the VM.

You can also run bosh vms to identify VMs in your deployment, then use the bosh ssh command to SSH into an identified VM for further troubleshooting.

bosh vms supports the following arguments:

  • --details: Report also includes Cloud ID, Agent ID, and whether or not the BOSH Resurrector has been enabled for each VM

  • --vitals: Report also includes load, CPU, memory usage, swap usage, system disk usage, ephemeral disk usage, and persistent disk usage for each VM

  • --dns: Report also includes the DNS A record for each VM

Note: The Status tab of the Elastic Runtime product tile displays information similar to the bosh vms output.

BOSH Cloudcheck

Run the bosh cloudcheck command to instruct BOSH to detect differences between the VM state database maintained by the BOSH Director and the actual state of the VMs. For each difference detected, bosh cloudcheck can offer the following repair options:

  • Reboot VM: Instructs BOSH to reboot a VM. Rebooting can resolve many transient errors.
  • Ignore problem: Instructs BOSH to do nothing. You may want to ignore a problem in order to run bosh ssh and attempt troubleshooting directly on the machine.
  • Reassociate VM with corresponding instance: Updates the BOSH Director state database. Use this option if you believe that the BOSH Director state database is in error and that a VM is correctly associated with a job.
  • Recreate VM using last known apply spec: Instructs BOSH to destroy the server and recreate it from the deployment manifest that the installer provides. Use this option if a VM is corrupted.
  • Delete VM reference: Instructs BOSH to delete a VM reference in the Director state database. If a VM reference exists in the state database, BOSH expects to find an agent running on the VM. Select this option only if you know that this reference is in error. Once you delete the VM reference, BOSH can no longer control the VM.

Example Scenarios

Unresponsive Agent

  $ bosh cloudcheck
  ccdb/0 (vm-3e37133c-bc33-450e-98b1-f86d5b63502a) is not responding:

  - Ignore problem
  - Reboot VM
  - Recreate VM using last known apply spec
  - Delete VM reference (DANGEROUS!)

Missing VM

  $ bosh cloudcheck
  VM with cloud ID `vm-3e37133c-bc33-450e-98b1-f86d5b63502a' missing:

  - Ignore problem
  - Recreate VM using last known apply spec
  - Delete VM reference (DANGEROUS!)

Unbound Instance VM

  $ bosh cloudcheck
  VM `vm-3e37133c-bc33-450e-98b1-f86d5b63502a' reports itself as `ccdb/0' but does not have a bound instance:

  - Ignore problem
  - Delete VM (unless it has persistent disk)
  - Reassociate VM with corresponding instance

Out of Sync VM

  $ bosh cloudcheck
  VM `vm-3e37133c-bc33-450e-98b1-f86d5b63502a' is out of sync:
  expected `cf-d7293430724a2c421061: ccdb/0', got `cf-d7293430724a2c421061: nats/0':

  - Ignore problem
  - Delete VM (unless it has persistent disk)

BOSH SSH

Use bosh ssh to SSH into the VMs in your deployment.

Follows the steps below to use bosh ssh:

  1. Run ssh-keygen -t rsa to provide BOSH with the correct public key.

  2. Accept the defaults.

  3. Run bosh ssh.

  4. Select a VM to access.

  5. Create a password for the temporary user that the bosh ssh command creates. Use this password if you need sudo access in this session.

Example:

$ bosh ssh
RSA 1024 bit CA certificates are loaded due to old openssl compatibility
1. diego_brain-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc555555555)
2. uaa-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-lll111111111)
3. cloud_controller_worker-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc333333333)
4. cloud_controller-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc222222222)
5. diego_cell-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc666666666)
6. diego_cell-partition-3333e3ee3332221e222e/1 (abc31111-111e-1ec3-bb3e-ccc777777777)
7. diego_cell-partition-3333e3ee3332221e222e/2 (abc31111-111e-1ec3-bb3e-ccc888888888)
8. router-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-kkk111111111)
9. loggregator_trafficcontroller-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-fff223111111)
10. nats-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-iii111111111) 
11. clock_global-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc111111111)
12. mysql_proxy-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-hhh111111111)
13. diego_database-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc999999999)
14. etcd_server-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-eee111111111)
15. mysql-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ggg111111111)
16. consul_server-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc444444444)
17. doppler-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ddd111111111)
18. nfs_server-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-jjj111111111)
Choose an instance:

  Choose an instance: 5
  Enter password (use it to sudo on remote host): *******
  Target deployment 'cf-33e333333eebbb3b33b3'

  Setting up ssh artifacts

Standard SSH

In most cases, operators should use the bosh ssh command in the BOSH CLI to SSH into the BOSH Director and other VMs in their deployment. However, operators can also use standard ssh by performing the procedures below.

  1. Locate the IP address of your BOSH Director and your BOSH Director credentials by following the steps above.
  2. SSH into the BOSH Director with the private key you used with bosh-init to deploy the BOSH Director:
    $ ssh BOSH-DIRECTOR-IP -i PATH-TO-PRIVATE-KEY 
  3. Enter your BOSH Director credentials to log in.

From the BOSH Director, you can SSH into the other VMs in your deployment by performing the following steps:

  1. Identify the private IP address of the component VM you want to SSH into by doing one of the following:
    • Perform the steps above to use the BOSH CLI to log in to your BOSH Director and use bosh vms to list the IP addresses of your component VMs.
    • Navigate to your IaaS console and locate the IP address of the VM. For example, Amazon Web Services users can locate the IP addresses of component VMs in the VPC Dashboard of the AWS Console.
  2. SSH into the component VM:
    $ ssh COMPONENT-VM-PRIVATE-IP
Create a pull request or raise an issue on the source for this page in GitHub