Troubleshooting for Ops Manager Operators

This topic provides information for operators about troubleshooting on-demand services.

How to Retrieve a Service Instance GUID

You need the GUID of your service instance to run some BOSH commands. To retrieve the GUID, run the command:

cf service SERVICE-INSTANCE-NAME --guid

If you do not know the name of the service instance, run cf services to see a listing of all service instances in the space. The service instances are listed in the name column.

Troubleshoot Errors

This section provides information about how to troubleshoot specific errors or error messages.


Failed Installation

Symptom service fails to install.
Cause Reasons for a failed installation include:
  • Certificate issues: The on-demand broker (ODB) requires valid certificates.
  • Deploy fails. This could be due to a variety of reasons.
  • Networking problems:
    • Cloud Foundry cannot reach the on-demand service broker
    • Cloud Foundry cannot reach the service instances
    • The service network cannot access the BOSH director
  • The Register broker errand fails.
  • The smoke test errand fails.
  • Resource sizing issues: These occur when the resource sizes selected for a given plan are less than the on-demand service requires to function.
  • Other service-specific issues.
Solution To troubleshoot:
  • Certificate issues: Ensure that your certificates are valid and generate new ones if necessary. To generate new certificates, contact Support.
  • Deploy fails: View the logs using Ops Manager to determine why the deploy is failing.
  • Networking problems: For how to troubleshoot, see Networking problems.
  • Register broker errand fails: For how to troubleshoot, see Register broker errand.
  • Resource sizing issues: Check your resource configuration in Ops Manager and ensure that the configuration matches that recommended by the service.


Cannot Create or Delete Service Instances

Symptom If developers report errors such as:
Instance provisioning failed: There was a problem completing your request. Please contact your operations team providing the following information: service: redis-acceptance, service-instance-guid: ae9e232c-0bd5-4684-af27-1b08b0c70089, broker-request-id: 63da3a35-24aa-4183-aec6-db8294506bac, task-id: 442, operation: create
Cause Reasons include:
  • Problems with the deployment manifest
  • Authentication errors
  • Network errors
  • Quota errors
Solution To troubleshoot:
  1. If the BOSH error shows a problem with the deployment manifest, open the manifest in a text editor to inspect it.

  2. To continue troubleshooting, Log in to BOSH and target the on-demand service instance using the instructions on parsing a Cloud Foundry error message.

  3. Retrieve the BOSH task ID from the error message and run the following command:

    bosh task TASK-ID
  4. If you need more information, access the broker logs and use the broker-request-id from the error message above to search the logs for more information. Check for:


Broker Request Timeouts

Symptom If developers report errors such as:
Server error, status code: 504, error code: 10001, message: The request to the service broker timed out: https://BROKER-URL/v2/service_instances/e34046d3-2379-40d0-a318-d54fc7a5b13f/service_bindings/aa635a3b-ef6d-41c3-a23f-55752f3f651b
Cause Cloud Foundry might not be connected to the service broker, or there might be a large number of queued tasks.
Solution To troubleshoot:
  1. Confirm that Cloud Foundry (CF) is connected to the service broker.
  2. Check the BOSH queue size:
    1. Log in to BOSH as an admin.
    2. Run
      bosh tasks
    If there are a large number of queued tasks, the system may be under too much load. BOSH is configured with two workers and one status worker, which might not be sufficient resources for the level of load.
  3. If the task queue is long, advise app developers to try again once the system is under less load.


Instance Does Not Exist

Symptom If developers report errors such as:
Server error, status code: 502, error code: 10001, message: Service broker error: instance does not exist`
Cause The instance might have been deleted.
Solution To troubleshoot:
  1. Confirm that the on-demand service instance exists in BOSH and obtain the GUID CF by running:

    cf service MY-INSTANCE --guid
  2. Using the GUID obtained above, run:

    bosh -d service-instance_GUID vms

If the BOSH deployment is not found, it has been deleted from BOSH. Contact Support for further assistance.


Cannot Bind to or Unbind from Service Instances

Symptom If developers report errors such as:
Server error, status code: 502, error code: 10001, message: Service broker error: There was a problem completing your request. Please contact your operations team providing the following information: service: example-service, service-instance-guid: 8d69de6c-88c6-4283-b8bc-1c46103714e2, broker-request-id: 15f4f87e-200a-4b1a-b76c-1c4b6597c2e1, operation: bind
Cause This might be due to authentication or network errors.
Solution To find out the exact issue with the binding process:
  1. Access the service broker logs.

  2. Search the logs for the broker-request-id string listed in the error message above.

  3. Check for:

  4. Contact Support for further assistance if you are unable to resolve the problem.


Cannot Connect to a Service Instance

Symptom Developers report that their app cannot use service instances that they have successfully created and bound.
Cause The error might originate from the service or be network related.
Solution To solve this issue, ask the user to send application logs that show the connection error. If the error originates from the service, then follow service-specific instructions. If the issue appears to be network-related, then:
  1. Check that application security groups are configured correctly. Access should be configured for the service network that the tile is deployed to.

  2. Ensure that the network the PAS tile is deployed to has network access to the service network. You can find the network definition for this service network in the BOSH Director tile.

  3. In Ops Manager go into the service tile and see the service network that is configured in the networks tab.

  4. In Ops Manager go into the PAS tile and see the network it is assigned to. Make sure that these networks can access each other.


Cannot Update a Service Instance

Symptom If developers report errors such as the following when trying to run cf-update-service:
FAILED
Server error, status code: 502, error code: 10001, message:
Service broker error: Service cannot be updated at this time,
please try again later or contact your operator for more information.
Cause Their service instance might not be running the latest service offering.
Solution

Operators must run the upgrade-all-service-instances errand after upgrading to ensure all existing service instances are upgraded to the latest service offering. See Upgrade All Service Instances.

If maintenance_info is configured with a compatible CF version, app developers can also run update-service --upgrade to upgrade an individual service instance to the latest service offering.

Until you resolve this issue, app developers cannot set parameters or change plans. For more information, see Upgrade All Service Instances and (Optional) Configure Maintenance Information.


Upgrade All Service Instances Errand Fails

Symptom The upgrade-all-service-instances errand fails.
Cause There might be a problem with a particular instance.
Solution To troubleshoot:
  1. Look at the errand output in the Ops Manager log.
  2. If an instance has failed to upgrade, debug and fix it before running the errand again to prevent any failure issues from spreading to other on-demand instances.
  3. After the Ops Manager log no longer lists the deployment as failing, re-run the errand to upgrade the rest of the instances.


Missing Logs and Metrics

Symptom No logs are being emitted by the on-demand broker.
Cause Syslog might not be configured correctly, or you might have network access issues.
Solution To troubleshoot:
  1. Ensure you have configured syslog for the tile.

  2. Check that your syslog forwarding address is correct in Ops Manager.
  3. Ensure that you have network connectivity between the networks that the tile is using and the syslog destination. If the destination is external, you need to use the public ip VM extension feature available in your Ops Manager tile configuration settings.

  4. Verify that Loggregator is emitting metrics:

    1. Install the cf log-stream plugin. For instructions, see the Log Stream CLI Plugin GitHub repository.

    2. Find the GUID for your service instance by running:

      cf service SERVICE-INSTANCE --guid
    3. Find logs from your service instance by running:

      cf log-stream | grep "SERVICE-GUID"
    4. If no metrics appear within five minutes, verify that the broker network has access to the Loggregator system on all required ports.
  5. If you are unable to resolve the issue, contact Support.

Troubleshoot Components

This section provides information about troubleshooting on-demand broker components.

BOSH Problems

Large BOSH Queue

On-demand service brokers add tasks to the BOSH request queue, which can back up and cause delay under heavy loads. An app developer who requests a new On-Demand Services SDK instance sees create in progress in the Cloud Foundry Command Line Interface (cf CLI) until BOSH processes the queued request.

Ops Manager currently deploys two BOSH workers to process its queue. Future versions of Ops Manager will let users configure the number of BOSH workers.

Configuration

Service Instances in Failing State

The VM or Disk type that you configured in the plan page of the tile in Ops Manager might not be large enough for the On-Demand Services SDK service instance to start. See tile-specific guidance on resource requirements.

Authentication

UAA Changes

If you have rotated any UAA user credentials then you may see authentication issues in the service broker logs.

To resolve this, redeploy the On-Demand Services SDK tile in Ops Manager. This provides the broker with the latest configuration.

Note: You must ensure that any changes to UAA credentials are reflected in the Ops Manager credentials tab of the Pivotal Application Service tile.

Networking

Common issues with networking include:

Issue Solution
Latency when connecting to the On-Demand Services SDK service instance to create or delete a binding. Try again or improve network performance.
Firewall rules are blocking connections from the On-Demand Services SDK service broker to the service instance. Open the On-Demand Services SDK tile in Ops Manager and check the two networks configured in the Networks pane. Ensure that these networks allow access to each other.
Firewall rules are blocking connections from the service network to the BOSH director network. Ensure that service instances can access the Director so that the BOSH agents can report in.
Apps cannot access the service network. Configure Cloud Foundry application security groups to allow runtime access to the service network.
Problems accessing BOSH’s UAA or the BOSH director. Follow network troubleshooting and check that the BOSH director is online

Validate Service Broker Connectivity to Service Instances

To validate connectivity, do the following:

  1. To SSH into the On-Demand Services SDK service broker, run the following command:

    bosh -d service-instance_GUID ssh
  2. If no BOSH task-id appears in the error message, look in the broker log using the broker-request-id from the task.

Validate App Access to Service Instance

Use cf ssh to access to the app container, then try connecting to the On-Demand Services SDK service instance using the binding included in the VCAP_SERVICES environment variable.

Quotas

Plan Quota Issues

If developers report errors such as:

Message: Service broker error: The quota for this service plan has been exceeded.
Please contact your Operator for help.
  1. Check your current plan quota.
  2. Increase the plan quota.
  3. Log in to Ops Manager.
  4. Reconfigure the quota on the plan page.
  5. Deploy the tile.
  6. Find who is using the plan quota and take the appropriate action.

Global Quota Issues

If developers report errors such as:

Message: Service broker error: The quota for this service has been exceeded.
Please contact your Operator for help.
  1. Check your current global quota.
  2. Increase the global quota.
  3. Log in to Ops Manager.
  4. Reconfigure the quota on the on-demand settings page.
  5. Deploy the tile.
  6. Find out who is using the quota and take the appropriate action.

Failing Jobs and Unhealthy Instances

To determine whether there is an issue with the On-Demand Services SDK deployment, inspect the VMs. To do so, run the following command:

bosh -d service-instance_GUID vms --vitals

For additional information, run the following command:

bosh instances --ps --vitals

If the VM is failing, follow the service-specific information. Any unadvised corrective actions (such as running BOSH restart on a VM) can cause issues in the service instance.

Techniques for Troubleshooting

This section provides general techniques for troubleshooting, which might include the following:

  • Interacting with the on-demand service broker
  • Interacting with on-demand service instance BOSH deployments
  • Performing general maintenance and housekeeping tasks

Parse a Cloud Foundry (CF) Error Message

Failed operations (create, update, bind, unbind, delete) result in an error message. You can retrieve the error message later by running the cf CLI command cf service INSTANCE-NAME.

$ cf service myservice

Service instance: myservice
Service: super-db
Bound apps:
Tags:
Plan: dedicated-vm
Description: Dedicated Instance
Documentation url:
Dashboard:

Last Operation
Status: create failed
Message: Instance provisioning failed: There was a problem completing your request.
     Please contact your operations team providing the following information:
     service: redis-acceptance,
     service-instance-guid: ae9e232c-0bd5-4684-af27-1b08b0c70089,
     broker-request-id: 63da3a35-24aa-4183-aec6-db8294506bac,
     task-id: 442,
     operation: create
Started: 2017-03-13T10:16:55Z
Updated: 2017-03-13T10:17:58Z

Use the information in the Message field to debug further. Provide this information to Support when filing a ticket.

The task-id field maps to the BOSH task ID. For more information on a failed BOSH task, use the bosh task TASK-ID.

The broker-request-guid maps to the portion of the On-Demand Broker log containing the failed step. Access the broker log through your syslog aggregator, or access BOSH logs for the broker by typing bosh logs broker 0. If you have more than one broker instance, repeat this process for each instance.

Access Broker and Instance Logs and VMs

Before following the procedures below, log in to the cf CLI and the BOSH CLI.

Access Broker Logs and VM(s)

You can access logs using Ops Manager by clicking on the Logs tab in the tile and downloading the broker logs.

To access logs using the BOSH CLI, do the following:

  1. Identify the on-demand broker (ODB) deployment by running the following command:

    bosh deployments
  2. View VMs in the deployment by running the following command:

    bosh -d DEPLOYMENT-NAME instances
  3. SSH onto the VM by running the following command:

    bosh -d service-instance_GUID ssh
  4. Download the broker logs by running the following command:

    bosh -d service-instance_GUID logs

The archive generated by BOSH or Ops Manager includes the following logs:

Log Name Description
broker.log Requests to the on-demand broker and the actions the broker performs while orchestrating the request (e.g. generating a manifest and calling BOSH). Start here when troubleshooting.
broker_ctl.log Control script logs for starting and stopping the on-demand broker.
post-start.stderr.log Errors that occur during post-start verification.
post-start.stdout.log Post-start verification.
drain.stderr.log Errors that occur while running the drain script.

Access Service Instance Logs and VMs

  1. To target an individual service instance deployment, retrieve the GUID of your service instance with the following cf CLI command:

    cf service MY-SERVICE --guid
  2. To view VMs in the deployment, run the following command:

    bosh -d DEPLOYMENT-NAME instances
  3. To SSH into a VM, run the following command:

    bosh -d service-instance_GUID ssh
  4. To download the instance logs, run the following command:

    bosh -d service-instance_GUID logs

Run Service Broker Errands to Manage Brokers and Instances

From the BOSH CLI, you can run service broker errands that manage the service brokers and perform mass operations on the service instances that the brokers created. These service broker errands include:

To run an errand, run the following command:

bosh -d DEPLOYMENT-NAME run-errand ERRAND-NAME

For example:

bosh -d my-deployment run-errand deregister-broker

Register Broker

The register-broker errand does the following:

  • Registers the service broker with Cloud Controller.
  • Enables service access for any plans that are enabled on the tile.
  • Disables service access for any plans that are disabled on the tile.
  • Does nothing for any plans that are set to manual on the tile.

You should run this errand whenever the broker is re-deployed with new catalog metadata to update the Marketplace.

Plans with disabled service access are only visible to admin Cloud Foundry users. Non-admin Cloud Foundry users, including Org Managers and Space Managers, cannot see these plans.

Deregister Broker

This errand deregisters a broker from Cloud Foundry.

The errand does the following:

  • Deletes the service broker from Cloud Controller
  • Fails if there are any service instances, with or without bindings

Use the Delete All Service Instances errand to delete any existing service instances.

To run the errand, run the following command:

bosh -d DEPLOYMENT-NAME run-errand deregister-broker

Upgrade All Service Instances

The upgrade-all-service-instances errand does the following:

  • Collects all of the service instances that the on-demand broker has registered.
  • Issues an upgrade command to the on-demand broker, regenerates the service instance manifest based on the latest configuration from the tile, deploys the new manifest for the service instance, and waits for this operation to complete, then proceeds to the next instance.
  • Adds to a retry list any instances that have ongoing BOSH tasks at the time of upgrade. Retries any instances in the retry list until all instances are upgraded.

When you make changes to the plan configuration, upgrade all the On-Demand Services SDK service instances to the latest version of the plan.

If any instance fails to upgrade, the errand fails immediately. This prevents systemic problems from spreading to the rest of your service instances.

Delete All Service Instances

This errand uses the Cloud Controller API to delete all instances of your broker’s service offering in every Cloud Foundry org and space. It only deletes instances the Cloud Controller knows about. It does not delete orphan BOSH deployments.

Note: Orphan BOSH deployments do not correspond to a known service instance. While rare, orphan deployments can occur. Use the orphan-deployments errand to identify them.

The delete-all-service-instances errand does the following:

  1. Unbinds all apps from the service instances.
  2. Deletes all service instances sequentially. Each service instance deletion includes:
    1. Running any pre-delete errands
    2. Deleting the BOSH deployment of the service instance
    3. Removing any ODB-managed secrets from BOSH CredHub
    4. Checking for instance deletion failure, which results in the errand failing immediately
  3. Determines whether any instances have been created while the errand was running. If new instances are detected, the errand returns an error. In this case, VMware recommends running the errand again.

Warning: Use extreme caution when running this errand. You should only use it when you want to totally destroy all of the on-demand service instances in an environment.

To run the errand, run the following command:

bosh -d service-instance_GUID delete-deployment

Detect Orphaned Instances Service Instances

A service instance is defined as “orphaned” when the BOSH deployment for the instance is still running, but the service is no longer registered in Cloud Foundry.

The orphan-deployments errand collates a list of service deployments that have no matching service instances in Cloud Foundry and return the list to the operator. It is then up to the operator to remove the orphaned BOSH deployments.

To run the errand, run the following command:

bosh -d DEPLOYMENT-NAME run-errand orphan-deployments

If orphan deployments exist—The errand script does the following:

  • Exit with exit code 10
  • Output a list of deployment names under a [stdout] header
  • Provide a detailed error message under a [stderr] header

For example:

[stdout]
[{"deployment\_name":"service-instance\_80e3c5a7-80be-49f0-8512-44840f3c4d1b"}]

[stderr]
Orphan BOSH deployments detected with no corresponding service instance in Cloud Foundry. Before deleting any deployment it is recommended to verify the service instance no longer exists in Cloud Foundry and any data is safe to delete.

Errand 'orphan-deployments' completed with error (exit code 10)

These details will also be available through the BOSH /tasks/ API endpoint for use in scripting:

$ curl 'https://bosh-user:bosh-password@bosh-url:25555/tasks/task-id/output?type=result' | jq .
{
  "exit_code": 10,
  "stdout": "[{"deployment_name":"service-instance_80e3c5a7-80be-49f0-8512-44840f3c4d1b"}]\n",
  "stderr": "Orphan BOSH deployments detected with no corresponding service instance in Cloud Foundry. Before deleting any deployment it is recommended to verify the service instance no longer exists in Cloud Foundry and any data is safe to delete.\n",
  "logs": {
    "blobstore_id": "d830c4bf-8086-4bc2-8c1d-54d3a3c6d88d"
  }
}

If no orphan deployments exist—The errand script does the following:

  • Exit with exit code 0
  • Stdout will be an empty list of deployments
  • Stderr will be None
[stdout]
[]

[stderr]
None

Errand 'orphan-deployments' completed successfully (exit code 0)

If the errand encounters an error during running—The errand script does the following:

  • Exit with exit 1
  • Stdout will be empty
  • Any error messages will be under stderr

To clean up orphaned instances, run the following command on each instance:

WARNING: Running this command may leave IaaS resources in an unusable state.

bosh delete-deployment service-instance_SERVICE-INSTANCE-GUID

Get Admin Credentials for a Service Instance

To retrieve the admin credentials for a service instance from BOSH CredHub:

  1. Use the cf CLI to determine the GUID associated with the service instance for which you want to retrieve credentials by running:
    cf service SERVICE-INSTANCE-NAME --guid
    For example:
    $ cf service my-service-instance --guid
    
    12345678-90ab-cdef-1234-567890abcdef
    If you do not know the name of the service instance, you can list service instances in the space with cf services.
  2. Follow the steps in Gather Credential and IP Address Information and Log In to the Ops Manager VM with SSH of Advanced Troubleshooting with the BOSH CLI to SSH into the Ops Manager VM.
  3. From the Ops Manager VM, log in to your BOSH Director with the BOSH CLI. See Authenticate with the BOSH Director VM in Advanced Troubleshooting with the BOSH CLI.
  4. Find the values for BOSH_CLIENT and BOSH_CLIENT_SECRET:

    1. In the Ops Manager Installation Dashboard, click the BOSH Director tile.
    2. Click the Credentials tab.
    3. In the BOSH Director section, click the link to the BOSH Commandline Credentials .
    4. Record the values for BOSH_CLIENT and BOSH_CLIENT_SECRET.
  5. Set the API target of the CredHub CLI to your BOSH CredHub server by running:
    credhub api https://BOSH-DIRECTOR-IP:8844 \
          --ca-cert=/var/tempest/workspaces/default/root_ca_certificate
    Where BOSH-DIRECTOR-IP is the IP address of the BOSH Director VM.

    For example:
    $ credhub api https://10.0.0.5:8844 \
          --ca-cert=/var/tempest/workspaces/default/root_ca_certificate
  6. Log in to CredHub by running:
    credhub login \
        --client-name=BOSH-CLIENT \
        --client-secret=BOSH-CLIENT-SECRET

    For example:

    $ credhub login \
          --client-name=credhub \
          --client-secret=abcdefghijklm123456789
  7. Use the CredHub CLI to retrieve the credentials :

    • Retrieve the password for the admin user by running:
      credhub get -n /p-bosh/service-instance_GUID/admin_password
      In the output, the password appears under value. Record the password.
      For example:
      $ credhub get \
        -n /p-bosh/service-instance_70d30bb6-7f30-441a-a87c-05a5e4afff26/admin_password 
      id: d6e5bd10-3b60-4a1a-9e01-c76da688b847 name: /p-bosh/service-instance_70d30bb6-7f30-441a-a87c-05a5e4afff26/admin_password type: password value: UMF2DXsqNPPlCNWMdVMcNv7RC3Wi10 version_created_at: 2018-04-02T23:16:09Z

Identify Apps using a Service Instance

To identify which apps are using a specific service instance from the name of the BOSH deployment:

  1. Take the deployment name and strip the service-instance_ leaving you with the GUID.
  2. Log in to CF as an admin.
  3. Obtain a list of all service bindings by running the following:
    cf curl /v2/service_instances/GUID/service_bindings
  4. The output from the above curl gives you a list of resources, with each item referencing a service binding, which contains the APP-URL. To find the name, org, and space for the app, run the following:
    1. cf curl APP-URL and record the app name under entity.name.
    2. cf curl SPACE-URL to obtain the space, using the entity.space_url from the above curl. Record the space name under entity.name.
    3. cf curl ORGANIZATION-URL to obtain the org, using the entity.organization_url from the above curl. Record the organization name under entity.name.

Note: When running cf curl ensure that you query all pages, because the responses are limited to a certain number of bindings per page. The default is 50. To find the next page curl the value under next_url.

View BOSH Resource Saturation and Scaling

To view usage statistics for any service, do the following:

  1. Run the following command:

    bosh -d DEPLOYMENT-NAME vms --vitals
  2. To view process-level information, run:

    bosh -d DEPLOYMENT-NAME instances --ps

Monitor Quota Saturation and Service Instance Count

Quota saturation and total number of service instances are available through ODB metrics emitted to Loggregator. The metric names are shown below:

Metric Name Description
on-demand-broker/SERVICE-NAME-MARKETPLACE/quota_remaining global quota remaining for all instances across all plans
on-demand-broker/SERVICE-NAME-MARKETPLACE/PLAN-NAME/quota_remaining quota remaining for a particular plan
on-demand-broker/SERVICE-NAME-MARKETPLACE/total_instances total instances created across all plans
on-demand-broker/SERVICE-NAME-MARKETPLACE/PLAN-NAME/total_instances total instances created for a given plan

Note: Quota metrics are not emitted if no quota has been set.

Reinstall a Tile

To reinstall a tile in the same environment where it was previously uninstalled:

  1. Ensure that the previous tile was correctly uninstalled as follows:
    1. Log in as an admin by running:
      cf login
    2. Confirm that the Marketplace does not list On-Demand Services SDK by running:
      cf m
    3. Log in to BOSH as an admin by running:
      bosh log-in
    4. Display your BOSH deployments to confirm that the output does not show the On-Demand Services SDK deployment by running:
      bosh deployments
    5. Run the “delete-all-service-instances” errand to delete every instance of the service.
    6. Run the “deregister-broker” errand to delete the service broker.
    7. Delete the service broker BOSH deployment by running:
      bosh delete-deployment BROKER-DEPLOYMENT-NAME
    8. Reinstall the tile.

Knowledge Base (Community)

Find the answer to your question and browse product discussions and solutions by searching the VMware Tanzu Knowledge Base.

File a Support Ticket

You can file a ticket with Support. Be sure to provide the error message from cf service YOUR-SERVICE-INSTANCE.

To expedite troubleshooting, provide your service broker logs and your service instance logs. If your cf service YOUR-SERVICE-INSTANCE output includes a task-id, provide the BOSH task output.