Advanced Troubleshooting with the BOSH CLI

Page last updated:

To perform advanced troubleshooting, you must log in to the BOSH Director. From there, you can run specific commands using the BOSH Command Line Interface (CLI). BOSH Director diagnostic commands have access to information about your entire Pivotal Cloud Foundry (PCF) installation.

The BOSH Director runs on the virtual machine (VM) that Ops Manager deploys on the first install of the Ops Manager Director tile.

BOSH Director diagnostic commands have access to information about your entire Pivotal Cloud Foundry (PCF) installation.

Note: For more troubleshooting information, refer to the Troubleshooting Guide.

Note: Verify that no BOSH Director tasks are running on the Ops Manager VM before running any commands. You should not proceed with troubleshooting until all BOSH Director tasks have completed or you have ended them. See the BOSH CLI Commands for more information.

About the BOSH CLI

The BOSH CLI is available in two major versions, v1 and v2. You can use either BOSH CLI, but Pivotal recommends that you use the BOSH CLI v2 since v1 will be deprecated. BOSH CLI v2 is installed on Ops Manager as /usr/local/bin/bosh2.

Ops Manager includes both versions for backwards compatibility purposes. Any manifests that have been generated using BOSH v2 cannot be deployed with the BOSH v1 CLI.

This topic provides examples of using each version of the BOSH CLI.

Prepare to Use the BOSH CLI

This section guides you through preparing to use the BOSH CLI.

Gather Information

Before you begin troubleshooting with the BOSH CLI, collect the information you need from the Ops Manager interface.

  1. Open the Ops Manager interface by navigating to the Ops Manager fully qualified domain name (FQDN). Ensure that there are no installations or updates in progress.

  2. Click the Ops Manager Director tile and select the Status tab.

  3. Record the IP address for the Director job. This is the IP address of the VM where the BOSH Director runs.

    Ops mgr job ip

  4. Select the Credentials tab.

  5. Click Link to Credential to view and record the Director Credentials.

    Bosh creds

  6. Return to the Installation Dashboard.

  7. (Optional) To prepare to troubleshoot the job VM for any other product, click the product tile and repeat the procedure above to record the IP address and VM credentials for that job VM.

  8. Log out of Ops Manager.

Note: You must log out of the Ops Manager interface to use the BOSH CLI.

SSH into Ops Manager

Use SSH to connect to the Ops Manager web application VM.

To SSH into the Ops Manager VM:

vSphere:

You need the credentials used to import the PCF .ova or .ovf file into your virtualization system.

  1. From a command line, run ssh ubuntu@OPS-MANAGER-FQDN.

  2. When prompted, enter the password that you set during the .ova deployment into vCenter:

    $ ssh ubuntu@OPS-MANAGER-FQDN
    Password: ***********
    

AWS, Azure, and OpenStack:

  1. Locate the Ops Manager FQDN on the AWS EC2 instances page or the OpenStack Access & Security page.

  2. Change the permissions on the .pem file to be more restrictive:

    $ chmod 600 ops_mgr.pem
    
  3. Run the ssh command:

    ssh -i ops_mgr.pem ubuntu@OPS-MANAGER-FQDN
    

GCP:

  1. Confirm that you have installed the gcloud CLI. If you do not have the gcloud CLI, see the Google Cloud Platform documentation.

  2. Run gcloud config set project MY-PROJECT to configure your Google Cloud Platform project. For example:

    $ gcloud config set project gcp
    

  3. Run gcloud auth login MY-GCP-ACCOUNT. For example:

    $ gcloud auth login user@example.com
    

  4. Run gcloud compute ssh MY-INSTANCE --zone MY-ZONE. For example:

    $ gcloud compute ssh om-pcf-1a --zone us-central1-b
    

  5. Run sudo su - ubuntu to switch to the ubuntu user.

Log in to the BOSH Director

Depending on the BOSH CLI version you are using, follow the procedure for BOSH CLI v1 or BOSH CLI v2.

BOSH CLI v1

Target the BOSH Director

  1. Target the BOSH UAA on Ops Manager with the UAAC command uaac target.

    $ uaac target --ca-cert /var/tempest/workspaces/default/root_ca_certificate https://DIRECTOR-IP-ADDRESS:8443
    
  2. Run bosh target DIRECTOR-IP-ADDRESS to target your Ops Manager VM using the BOSH CLI.

  3. Log in to the BOSH Director using one of the following options:

Log in to the BOSH Director with UAA

  1. Retrieve the Director password from the Ops Manager Director > Credentials tab. Alternatively, launch a browser and visit the following URL to obtain the password:

    https://{OPSMANAGER-FQDN}/api/v0/deployed/director/credentials/director_credentials
    
  2. Log in using the BOSH Director credentials:

    $ bosh --ca-cert /var/tempest/workspaces/default/root_ca_certificate target DIRECTOR-IP-ADDRESS
    Target set to 'DIRECTOR_UUID'
    Email: director
    Enter password: (DIRECTOR_CREDENTIAL)
    Logged in as 'director'
    

Log in to the BOSH Director with SAML

  1. Log in to your identity provider and use the following information to configure SAML Service Provider Properties:

    • Service Provider Entity ID: bosh-uaa
    • ACS URL: https://DIRECTOR-IP-ADDRESS:8443/saml/SSO/alias/bosh-uaa
    • Binding: HTTP Post
    • SLO URL: https://DIRECTOR-IP-ADDRESS:8443/saml/SSO/alias/bosh-uaa
    • Binding: HTTP Redirect
    • Name ID: Email Address
  2. Log in to BOSH using your SAML credentials:

    $ bosh login
    Email: admin
    Password:
    One Time Code (Get one at https://10.0.0.3:8888/passcode):
    

    If you do not have browser access to the BOSH Director, run sshuttle on a local Linux workstation to browse the BOSH Director IP as if it were a local address. Retrieve a UAA passcode using the browser:

    $ git clone https://github.com/apenwarr/sshuttle.git
    $ cd sshuttle
    $ ./sshuttle -r username@opsmanagerIP 0.0.0.0/0 -vv
    

  3. Click Log in with organization credentials (SAML).

    Login saml credentials

  4. Copy the Temporary Authentication Code that appears in your browser.

    Saml login temp auth code

  5. You see a login confirmation. For example:

    Logged in as admin@example.org
    

BOSH CLI v2

Create a Local BOSH Director Alias

  1. Run the following command to create a local alias for the BOSH Director using the BOSH CLI: bosh2 alias-env MY-ENV -e DIRECTOR-IP-ADDRESS --ca-cert /var/tempest/workspaces/default/root_ca_certificate

    Replace the placeholder text with the following:

    • MY-ENV: Enter an alias for the BOSH Director, such as gcp.
    • DIRECTOR-IP-ADDRESS: Enter the IP address of your Ops Manager Director VM. For example:
      $ bosh2 alias-env gcp -e 10.0.0.3 --ca-cert /var/tempest/workspaces/default/root_ca_certificate
  2. Log in to the BOSH Director using one of the following options:

Log in to the BOSH Director with UAA

  1. Retrieve the Director password from the Ops Manager Director > Credentials tab. Alternatively, launch a browser and visit https://OPS-MANAGER-FQDN/api/v0/deployed/director/credentials/director_credentials to obtain the password. Replace OPS-MANAGER-FQDN with the fully qualified domain name of Ops Manager.

  2. Run bosh2 -e MY-ENV log-in to log in to the BOSH Director. Replace MY-ENV with the alias for your BOSH Director. For example:

    $ bosh2 -e gcp log-in
    Follow the BOSH CLI prompts and enter the Ops Manager Director credentials to log in to the BOSH Director.

Log in to the BOSH Director with SAML

  1. Log in to your identity provider and use the following information to configure SAML Service Provider Properties:

    • Service Provider Entity ID: bosh-uaa
    • ACS URL: https://DIRECTOR-IP-ADDRESS:8443/saml/SSO/alias/bosh-uaa
    • Binding: HTTP Post
    • SLO URL: https://DIRECTOR-IP-ADDRESS:8443/saml/SSO/alias/bosh-uaa
    • Binding: HTTP Redirect
    • Name ID: Email Address
  2. Run bosh2 -e MY-ENV log-in to log in to the BOSH Director. Replace MY-ENV with the alias for your BOSH Director. For example:

    $ bosh2 -e gcp log-in
    Follow the BOSH CLI prompts and enter your SAML credentials to log in to the BOSH Director.

    If you do not have browser access to the BOSH Director, run sshuttle on a local Linux workstation to browse the BOSH Director IP address as if it were a local address. Retrieve a UAA passcode using the browser:
    $ git clone https://github.com/apenwarr/sshuttle.git
    $ cd sshuttle
    $ ./sshuttle -r username@opsmanagerIP 0.0.0.0/0 -vv
    

  3. Click Log in with organization credentials (SAML).

    Login saml credentials

  4. Copy the Temporary Authentication Code that appears in your browser.

    Saml login temp auth code

  5. You see a login confirmation. For example:

    Logged in as admin@example.org
    
  6. Depending on the version of the BOSH CLI you are using, continue to either the BOSH CLI v1 or BOSH CLI v2 instructions.

Troubleshoot a Deployment

BOSH CLI v1

  1. Run bosh status and record the UUID value.

  2. Open the DEPLOYMENT-FILENAME.yml file in a text editor and compare the director_uuids value in this file with the UUID value that you recorded. If the values do not match, perform the following steps:

    1. Replace the director_uuids value with the UUID value.
    2. Run bosh deployment DEPLOYMENT-FILENAME.yml to reset the file for your deployment.
  3. Run bosh deployment DEPLOYMENT-FILENAME.yml to instruct the BOSH Director to apply BOSH CLI commands against the deployment described by the YAML file that you identified:

    $ bosh deployment /var/tempest/workspaces/default/deployments/cf-cca1234abcd.yml
    

Select a Product Deployment to Troubleshoot

When you import and install a product using Ops Manager, you deploy an instance of the product described by a YAML file. Examples of available products include Elastic Runtime, MySQL, or any other service that you imported and installed.

Perform the following steps to select a product deployment to troubleshoot:

  1. Identify the YAML file that describes the deployment you want to troubleshoot.

    You identify the YAML file that describes a deployment by its filename. For example, to identify Elastic Runtime deployments, run the following command:

    find /var/tempest/workspaces/default/deployments -name cf-*.yml

    The table below shows the naming conventions for deployment files.

    ProductDeployment Filename Convention
    Elastic Runtime cf-<20-character_random_string>.yml
    MySQL Dev cf_services-<20-character_random_string>.yml
    Other <20-character_random_string>.yml

    Note: Where there is more than one installation of the same product, record the release number shown on the product tile in Operations Manager. Then, from the YAML files for that product, find the deployment that specifies the same release version as the product tile.

BOSH CLI v2

  1. Run bosh2 -e MY-ENV environment, replacing MY-ENV with the alias you set for the BOSH Director. For example:
    $ bosh2 -e gcp environment
    The output of the command includes the BOSH Director UUID. Record the UUID value.

Use the BOSH CLI for Troubleshooting

This section describes three BOSH CLI commands commonly used during troubleshooting.

  • VMs: Lists the VMs in a deployment
  • Cloud Check: Runs a cloud consistency check and interactive repair
  • SSH: Starts an interactive session or executes commands with a VM

BOSH VMs

bosh vms provides an overview of the virtual machines that BOSH manages as part of the current deployment.

When troubleshooting an issue with your deployment, bosh vms may show a VM in an unknown state. Run bosh cloudcheck on a VM in an unknown state to instruct BOSH to diagnose problems with the VM.

You can also run bosh vms to identify VMs in your deployment, then use the bosh ssh command to SSH into an identified VM for further troubleshooting.

If you use BOSH CLI v1, run bosh vms.

$ bosh vms
Acting as user 'director' on 'p-bosh-e11111e1e023e2ee1e11'
RSA 1024 bit CA certificates are loaded due to old openssl compatibility
Deployment 'cf-33e333333eebbb3b33b3'
Director task 2002
Task 2002 done
+-------------------------------------------------------------------------------------------------------+---------+-----+--------------------------------------------------------------+------------+
| VM                                                                                                    | State   | AZ  | VM Type                                                      | IPs        |
+-------------------------------------------------------------------------------------------------------+---------+-----+--------------------------------------------------------------+------------+
| clock_global-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc111111111)                  | running | n/a | clock_global-partition-9965d7cc1758828b974f                  | 10.0.16.20 |
| cloud_controller-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc222222222)              | running | n/a | cloud_controller-partition-3333e3ee3332221e222e              | 10.0.16.19 |
| cloud_controller_worker-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc333333333)       | running | n/a | cloud_controller_worker-partition-3333e3ee3332221e222e       | 10.0.16.21 |
| consul_server-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc444444444)                 | running | n/a | consul_server-partition-3333e3ee3332221e222e                 | 10.0.16.11 |
| diego_brain-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc555555555)                   | running | n/a | diego_brain-partition-3333e3ee3332221e222e                   | 10.0.16.23 |
| diego_cell-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc666666666)                    | running | n/a | diego_cell-partition-3333e3ee3332221e222e                    | 10.0.16.24 |
| diego_cell-partition-3333e3ee3332221e222e/1 (abc31111-111e-1ec3-bb3e-ccc777777777)                    | running | n/a | diego_cell-partition-3333e3ee3332221e222e                    | 10.0.16.25 |
| diego_cell-partition-3333e3ee3332221e222e/2 (abc31111-111e-1ec3-bb3e-ccc888888888)                    | running | n/a | diego_cell-partition-3333e3ee3332221e222e                    | 10.0.16.26 |
| diego_database-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc999999999)                | running | n/a | diego_database-partition-3333e3ee3332221e222e                | 10.0.16.14 |
| doppler-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ddd111111111)                       | running | n/a | doppler-partition-3333e3ee3332221e222e                       | 10.0.16.27 |
| etcd_server-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-eee111111111)                   | running | n/a | etcd_server-partition-3333e3ee3332221e222e                   | 10.0.16.13 |
| loggregator_trafficcontroller-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-fff223111111) | running | n/a | loggregator_trafficcontroller-partition-3333e3ee3332221e222e | 10.0.16.28 |
| mysql-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ggg111111111)                         | running | n/a | mysql-partition-3333e3ee3332221e222e                         | 10.0.16.18 |
| mysql_proxy-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-hhh111111111)                   | running | n/a | mysql_proxy-partition-3333e3ee3332221e222e                   | 10.0.16.17 |
| nats-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-iii111111111)                          | running | n/a | nats-partition-3333e3ee3332221e222e                          | 10.0.16.12 |
| nfs_server-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-jjj111111111)                    | running | n/a | nfs_server-partition-3333e3ee3332221e222e                    | 10.0.16.15 |
| router-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-kkk111111111)                        | running | n/a | router-partition-3333e3ee3332221e222e                        | 10.0.16.16 |
| uaa-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-lll111111111)                           | running | n/a | uaa-partition-3333e3ee3332221e222e                           | 10.0.16.22 |
+-------------------------------------------------------------------------------------------------------+---------+-----+--------------------------------------------------------------+------------+
VMs total: 18

If you use BOSH CLI v2, run bosh2 -e MY-ENV -d MY-DEPLOYMENT vms. Replace MY-ENV with your environment, and MY-DEPLOYMENT with a deployment. -d MY-DEPLOYMENT is optional.

bosh vms supports the following arguments:

  • --details: Report also includes Cloud ID, Agent ID, and whether or not the BOSH Resurrector has been enabled for each VM

  • --vitals: Report also includes load, CPU, memory usage, swap usage, system disk usage, ephemeral disk usage, and persistent disk usage for each VM

  • --dns: Report also includes the DNS A record for each VM

Note: The Status tab of the Elastic Runtime product tile displays information similar to the bosh vms output.

BOSH Cloudcheck

Run the bosh cloudcheck command to instruct BOSH to detect differences between the VM state database maintained by the BOSH Director and the actual state of the VMs. For each difference detected, bosh cloudcheck can offer the following repair options:

  • Reboot VM: Instructs BOSH to reboot a VM. Rebooting can resolve many transient errors.
  • Ignore problem: Instructs BOSH to do nothing. You may want to ignore a problem in order to run bosh ssh and attempt troubleshooting directly on the machine.
  • Reassociate VM with corresponding instance: Updates the BOSH Director state database. Use this option if you believe that the BOSH Director state database is in error and that a VM is correctly associated with a job.
  • Recreate VM using last known apply spec: Instructs BOSH to destroy the server and recreate it from the deployment manifest that the installer provides. Use this option if a VM is corrupted.
  • Delete VM reference: Instructs BOSH to delete a VM reference in the Director state database. If a VM reference exists in the state database, BOSH expects to find an agent running on the VM. Select this option only if you know that this reference is in error. Once you delete the VM reference, BOSH can no longer control the VM.

If you use BOSH CLI v1, run bosh cloudcheck.

If you use BOSH CLI v2, run bosh2 -e MY-ENV -d MY-DEPLOYMENT cloud-check. Replace MY-ENV with your environment, and MY-DEPLOYMENT with your deployment.

Example Scenarios

Unresponsive Agent

  $ bosh cloudcheck
  ccdb/0 (vm-3e37133c-bc33-450e-98b1-f86d5b63502a) is not responding:

  - Ignore problem
  - Reboot VM
  - Recreate VM using last known apply spec
  - Delete VM reference (DANGEROUS!)

Missing VM

  $ bosh cloudcheck
  VM with cloud ID `vm-3e37133c-bc33-450e-98b1-f86d5b63502a' missing:

  - Ignore problem
  - Recreate VM using last known apply spec
  - Delete VM reference (DANGEROUS!)

Unbound Instance VM

  $ bosh cloudcheck
  VM `vm-3e37133c-bc33-450e-98b1-f86d5b63502a' reports itself as `ccdb/0' but does not have a bound instance:

  - Ignore problem
  - Delete VM (unless it has persistent disk)
  - Reassociate VM with corresponding instance

Out of Sync VM

  $ bosh cloudcheck
  VM `vm-3e37133c-bc33-450e-98b1-f86d5b63502a' is out of sync:
  expected `cf-d7293430724a2c421061: ccdb/0', got `cf-d7293430724a2c421061: nats/0':

  - Ignore problem
  - Delete VM (unless it has persistent disk)

BOSH SSH

Use bosh ssh to SSH into the VMs in your deployment. Depending on the version of the BOSH CLI you are using, continue to either the BOSH CLI v1 or BOSH CLI v2 instructions.

BOSH CLI v1

Follows the steps below to use bosh ssh:

  1. Run ssh-keygen -t rsa to provide BOSH with the correct public key.

  2. Accept the defaults.

  3. Run bosh ssh.

  4. Select a VM to access.

  5. Create a password for the temporary user that the bosh ssh command creates. Use this password if you need sudo access in this session.

Example:

$ bosh ssh
RSA 1024 bit CA certificates are loaded due to old openssl compatibility
1. diego_brain-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc555555555)
2. uaa-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-lll111111111)
3. cloud_controller_worker-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc333333333)
4. cloud_controller-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc222222222)
5. diego_cell-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc666666666)
6. diego_cell-partition-3333e3ee3332221e222e/1 (abc31111-111e-1ec3-bb3e-ccc777777777)
7. diego_cell-partition-3333e3ee3332221e222e/2 (abc31111-111e-1ec3-bb3e-ccc888888888)
8. router-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-kkk111111111)
9. loggregator_trafficcontroller-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-fff223111111)
10. nats-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-iii111111111) 
11. clock_global-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc111111111)
12. mysql_proxy-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-hhh111111111)
13. diego_database-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc999999999)
14. etcd_server-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-eee111111111)
15. mysql-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ggg111111111)
16. consul_server-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc444444444)
17. doppler-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ddd111111111)
18. nfs_server-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-jjj111111111)
Choose an instance:

  Choose an instance: 5
  Enter password (use it to sudo on remote host): *******
  Target deployment 'cf-33e333333eebbb3b33b3'

  Setting up ssh artifacts

BOSH CLI v2

Follow the steps below to use bosh2 ssh:

  1. Identify a VM to SSH into. Run bosh2 -e MY-ENV -d MY-DEPLOYMENT vms to list the VMs in the given deployment. Replace MY-ENV with your environment alias and MY-DEPLOYMENT with the deployment name.

  2. Run bosh2 -e MY-ENV -d MY-DEPLOYMENT ssh VM-NAME/GUID. For example:

    $ bosh2 -e example-env -d example-deployment ssh diego-cell/abcd0123-a012-b345-c678-9def01234567

Create a pull request or raise an issue on the source for this page in GitHub