Advanced Troubleshooting with the BOSH CLI
Page last updated:
To perform advanced troubleshooting, you must log into the BOSH Director. From there, you can run specific commands using the BOSH Command Line Interface (CLI). BOSH Director diagnostic commands have access to information about your entire Pivotal Cloud Foundry (PCF) installation.
The BOSH Director runs on the virtual machine (VM) that Ops Manager deploys on the first install of the Ops Manager Director tile.
BOSH Director diagnostic commands have access to information about your entire Pivotal Cloud Foundry (PCF) installation.
Note: For more troubleshooting information, refer to the Troubleshooting Guide.
Note: Verify that no BOSH Director tasks are running on the Ops Manager VM before running any commands. You should not proceed with troubleshooting until all BOSH Director tasks have completed or you have ended them. See the Bosh CLI Commands for more information.
This section guides you through preparing to use the BOSH CLI.
Before you begin troubleshooting with the BOSH CLI, collect the information you need from the Ops Manager interface.
Open the Ops Manager interface by navigating to the Ops Manager fully qualified domain name (FQDN). Ensure that there are no installations or updates in progress.
Click the Ops Manager Director tile and select the Status tab.
Record the IP address for the Director job. This is the IP address of the VM where the BOSH Director runs.
Select the Credentials tab.
Click Link to Credential to view and record the Director Credentials.
Return to the Installation Dashboard.
(Optional) To prepare to troubleshoot the job VM for any other product, click the product tile and repeat the procedure above to record the IP address and VM credentials for that job VM.
Log out of Ops Manager.
Note: You must log out of the Ops Manager interface to use the BOSH CLI.
Use SSH to connect to the Ops Manager web application VM.
To SSH into the Ops Manager VM:
You need the credentials used to import the PCF .ova or .ovf file into your virtualization system.
From a command line, run
When prompted, enter the password that you set during the .ova deployment into vCenter:
$ ssh ubuntu@OPS-MANAGER-FQDN Password: ***********
AWS, Azure, and OpenStack:
Locate the Ops Manager FQDN on the AWS EC2 instances page or the OpenStack Access & Security page.
Change the permissions on the
.pemfile to be more restrictive:
$ chmod 600 ops_mgr.pem
ssh -i ops_mgr.pem ubuntu@OPS-MANAGER-FQDN
Log into the BOSH Director using one of the following options below:
External User Store Login via SAML - use an external user store to log into BOSH.
Target the BOSH UAA on Ops Manager with the UAAC command
$ uaac target --ca-cert /var/tempest/workspaces/default/root_ca_certificate https://DIRECTOR-IP-ADDRESS:8443
bosh target DIRECTOR-IP-ADDRESSto target your Ops Manager VM using the BOSH CLI.
Retrieve the UAA admin user password from the Ops Manager Director>Credentials tab. Alternatively, launch a browser and visit the following URL to obtain the password:
Log in using the BOSH Director credentials:
$ bosh --ca-cert /var/tempest/workspaces/default/root_ca_certificate target 192.0.2.6 Target set to 'DIRECTOR_UUID' Your username: director Enter password: (DIRECTOR_CREDENTIAL) Logged in as 'director'
To log into BOSH Director you need browser access to the BOSH Director in order to get a UAA Passcode. If you have browser access, skip to step 1 below.
If you do not have browser access to the BOSH Director, consider running
sshuttle on your local workstation (Linux only). This permits you to browse the BOSH Director IP (192.0.2.16 in our example) as if it were a local address.
$ git clone https://github.com/apenwarr/sshuttle.git $ cd sshuttle $ ./sshuttle -r username@opsmanagerIP 0.0.0.0/0 -vv
Log in to your identity provider and use the information below to configure SAML Service Provider Properties:
- Service Provider Entity ID: bosh-uaa
- ACS URL : https://BOSH-DIRECTOR-IP:8443/saml/SSO/alias/bosh-uaa
- Binding : HTTP Post
- SLO URL: https://BOSH-DIRECTOR-IP:8443/saml/SSO/alias/bosh-uaa
- Binding : HTTP Redirect
- Name ID : Email Address
Log into BOSH using your SAML credentials.
$ bosh login Email: admin Password: One Time Code (Get one at https://192.0.2.16.11:8888/passcode):
Click Login with organization credentials (SAML).
Copy the Temporary Authentication Code that appears in your browser.
You see a login confirmation. For example:
Logged in as email@example.com
When you import and install a product using Ops Manager, you deploy an instance of the product described by a YAML file. Examples of available products include Elastic Runtime, MySQL, or any other service that you imported and installed.
Perform the following steps to select a product deployment to troubleshoot:
Identify the YAML file that describes the deployment you want to troubleshoot.
You identify the YAML file that describes a deployment by its filename. For example, to identify Elastic Runtime deployments, run the following command:
find /var/tempest/workspaces/default/deployments -name cf-*.yml
The table below shows the naming conventions for deployment files.
Product Deployment Filename Convention Elastic Runtime cf-<20-character_random_string>.yml MySQL Dev cf_services-<20-character_random_string>.yml Other <20-character_random_string>.yml
Note: Where there is more than one installation of the same product, record the release number shown on the product tile in Operations Manager. Then, from the YAML files for that product, find the deployment that specifies the same release version as the product tile.
bosh statusand record the UUID value.
DEPLOYMENT-FILENAME.ymlfile in a text editor and compare the
director_uuidsvalue in this file with the UUID value that you recorded. If the values do not match, perform the following steps:
- Replace the
director_uuidsvalue with the UUID value.
bosh deployment DEPLOYMENT-FILENAME.ymlto reset the file for your deployment.
- Replace the
bosh deployment DEPLOYMENT-FILENAME.ymlto instruct the BOSH Director to apply BOSH CLI commands against the deployment described by the YAML file that you identified:
$ bosh deployment /var/tempest/workspaces/default/deployments/cf-cca1234abcd.yml
This section describes three BOSH CLI commands commonly used during troubleshooting.
- VMS: Lists all VMs in a deployment
- Cloudcheck: Runs a cloud consistency check and interactive repair
- SSH: Starts an interactive session or executes commands with a VM
bosh vms provides an overview of the virtual machines that BOSH manages as part of the current deployment.
$ bosh vms Acting as user 'director' on 'p-bosh-e11111e1e023e2ee1e11' RSA 1024 bit CA certificates are loaded due to old openssl compatibility Deployment 'cf-33e333333eebbb3b33b3' Director task 2002 Task 2002 done +-------------------------------------------------------------------------------------------------------+---------+-----+--------------------------------------------------------------+------------+ | VM | State | AZ | VM Type | IPs | +-------------------------------------------------------------------------------------------------------+---------+-----+--------------------------------------------------------------+------------+ | clock_global-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc111111111) | running | n/a | clock_global-partition-9965d7cc1758828b974f | 10.0.16.20 | | cloud_controller-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc222222222) | running | n/a | cloud_controller-partition-3333e3ee3332221e222e | 10.0.16.19 | | cloud_controller_worker-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc333333333) | running | n/a | cloud_controller_worker-partition-3333e3ee3332221e222e | 10.0.16.21 | | consul_server-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc444444444) | running | n/a | consul_server-partition-3333e3ee3332221e222e | 10.0.16.11 | | diego_brain-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc555555555) | running | n/a | diego_brain-partition-3333e3ee3332221e222e | 10.0.16.23 | | diego_cell-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc666666666) | running | n/a | diego_cell-partition-3333e3ee3332221e222e | 10.0.16.24 | | diego_cell-partition-3333e3ee3332221e222e/1 (abc31111-111e-1ec3-bb3e-ccc777777777) | running | n/a | diego_cell-partition-3333e3ee3332221e222e | 10.0.16.25 | | diego_cell-partition-3333e3ee3332221e222e/2 (abc31111-111e-1ec3-bb3e-ccc888888888) | running | n/a | diego_cell-partition-3333e3ee3332221e222e | 10.0.16.26 | | diego_database-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc999999999) | running | n/a | diego_database-partition-3333e3ee3332221e222e | 10.0.16.14 | | doppler-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ddd111111111) | running | n/a | doppler-partition-3333e3ee3332221e222e | 10.0.16.27 | | etcd_server-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-eee111111111) | running | n/a | etcd_server-partition-3333e3ee3332221e222e | 10.0.16.13 | | loggregator_trafficcontroller-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-fff223111111) | running | n/a | loggregator_trafficcontroller-partition-3333e3ee3332221e222e | 10.0.16.28 | | mysql-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ggg111111111) | running | n/a | mysql-partition-3333e3ee3332221e222e | 10.0.16.18 | | mysql_proxy-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-hhh111111111) | running | n/a | mysql_proxy-partition-3333e3ee3332221e222e | 10.0.16.17 | | nats-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-iii111111111) | running | n/a | nats-partition-3333e3ee3332221e222e | 10.0.16.12 | | nfs_server-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-jjj111111111) | running | n/a | nfs_server-partition-3333e3ee3332221e222e | 10.0.16.15 | | router-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-kkk111111111) | running | n/a | router-partition-3333e3ee3332221e222e | 10.0.16.16 | | uaa-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-lll111111111) | running | n/a | uaa-partition-3333e3ee3332221e222e | 10.0.16.22 | +-------------------------------------------------------------------------------------------------------+---------+-----+--------------------------------------------------------------+------------+ VMs total: 18
When troubleshooting an issue with your deployment,
bosh vms may show a VM in
an unknown state.
Run bosh cloudcheck on a VM in an unknown state to instruct BOSH to
diagnose problems with the VM.
You can also run
bosh vms to identify VMs in your deployment, then use the
bosh ssh command to SSH into an identified VM for further
bosh vms supports the following arguments:
--details: Report also includes Cloud ID, Agent ID, and whether or not the BOSH Resurrector has been enabled for each VM
--vitals: Report also includes load, CPU, memory usage, swap usage, system disk usage, ephemeral disk usage, and persistent disk usage for each VM
--dns: Report also includes the DNS A record for each VM
Note: The Status tab of the Elastic Runtime product tile displays information similar to the
bosh vms output.
bosh cloudcheck command to instruct BOSH to detect differences
between the VM state database maintained by the BOSH Director and the actual
state of the VMs. For each difference detected,
bosh cloudcheck can offer the
following repair options:
Reboot VM: Instructs BOSH to reboot a VM. Rebooting can resolve many transient errors.
Ignore problem: Instructs BOSH to do nothing. You may want to ignore a problem in order to run
bosh sshand attempt troubleshooting directly on the machine.
Reassociate VM with corresponding instance: Updates the BOSH Director state database. Use this option if you believe that the BOSH Director state database is in error and that a VM is correctly associated with a job.
Recreate VM using last known apply spec: Instructs BOSH to destroy the server and recreate it from the deployment manifest that the installer provides. Use this option if a VM is corrupted.
Delete VM reference: Instructs BOSH to delete a VM reference in the Director state database. If a VM reference exists in the state database, BOSH expects to find an agent running on the VM. Select this option only if you know that this reference is in error. Once you delete the VM reference, BOSH can no longer control the VM.
$ bosh cloudcheck ccdb/0 (vm-3e37133c-bc33-450e-98b1-f86d5b63502a) is not responding: - Ignore problem - Reboot VM - Recreate VM using last known apply spec - Delete VM reference (DANGEROUS!)
$ bosh cloudcheck VM with cloud ID `vm-3e37133c-bc33-450e-98b1-f86d5b63502a' missing: - Ignore problem - Recreate VM using last known apply spec - Delete VM reference (DANGEROUS!)
Unbound Instance VM
$ bosh cloudcheck VM `vm-3e37133c-bc33-450e-98b1-f86d5b63502a' reports itself as `ccdb/0' but does not have a bound instance: - Ignore problem - Delete VM (unless it has persistent disk) - Reassociate VM with corresponding instance
Out of Sync VM
$ bosh cloudcheck VM `vm-3e37133c-bc33-450e-98b1-f86d5b63502a' is out of sync: expected `cf-d7293430724a2c421061: ccdb/0', got `cf-d7293430724a2c421061: nats/0': - Ignore problem - Delete VM (unless it has persistent disk)
bosh ssh to SSH into the VMs in your deployment.
Follows the steps below to use
ssh-keygen -t rsato provide BOSH with the correct public key.
Accept the defaults.
Select a VM to access.
Create a password for the temporary user that the
bosh sshcommand creates. Use this password if you need sudo access in this session.
$ bosh ssh RSA 1024 bit CA certificates are loaded due to old openssl compatibility 1. diego_brain-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc555555555) 2. uaa-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-lll111111111) 3. cloud_controller_worker-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc333333333) 4. cloud_controller-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc222222222) 5. diego_cell-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc666666666) 6. diego_cell-partition-3333e3ee3332221e222e/1 (abc31111-111e-1ec3-bb3e-ccc777777777) 7. diego_cell-partition-3333e3ee3332221e222e/2 (abc31111-111e-1ec3-bb3e-ccc888888888) 8. router-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-kkk111111111) 9. loggregator_trafficcontroller-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-fff223111111) 10. nats-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-iii111111111) 11. clock_global-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc111111111) 12. mysql_proxy-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-hhh111111111) 13. diego_database-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc999999999) 14. etcd_server-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-eee111111111) 15. mysql-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ggg111111111) 16. consul_server-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ccc444444444) 17. doppler-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-ddd111111111) 18. nfs_server-partition-3333e3ee3332221e222e/0 (abc31111-111e-1ec3-bb3e-jjj111111111) Choose an instance: Choose an instance: 5 Enter password (use it to sudo on remote host): ******* Target deployment 'cf-33e333333eebbb3b33b3' Setting up ssh artifacts
In most cases, operators should use the
bosh ssh command in the BOSH CLI to SSH into the BOSH Director and other VMs in their deployment. However, operators can also use standard
ssh by performing the procedures below.
- Locate the IP address of your BOSH Director and your BOSH Director credentials by following the steps above.
- SSH into the BOSH Director with the public key you used with
bosh-initto deploy the BOSH Director:
$ ssh BOSH-DIRECTOR-IP -i PATH-TO-PUBLIC-KEY
- Enter your BOSH Director credentials to log in.
From the BOSH Director, you can SSH into the other VMs in your deployment by performing the following steps:
- Identify the private IP address of the component VM you want to SSH into by doing one of the following:
- Perform the steps above to use the BOSH CLI to log in to your BOSH Director and use
bosh vmsto list the IP addresses of your component VMs.
- Navigate to your IaaS console and locate the IP address of the VM. For example, Amazon Web Services users can locate the IP addresses of component VMs in the VPC Dashboard of the AWS Console.
- Perform the steps above to use the BOSH CLI to log in to your BOSH Director and use
- SSH into the component VM:
$ ssh COMPONENT-VM-PRIVATE-IP