Troubleshooting On-Demand Instances

Page last updated:

This topic provides techniques that app developers can use to begin troubleshooting on-demand instances.

Troubleshoot Errors

Start here if you are responding to a specific error or error messages.

Common Service Errors

Errors common to on-demand services are:


No Metrics from Log Cache

Symptom You receive no metrics when running the cf tail command.
Cause This might be because the Firehose is disabled in the TAS for VMs tile.
Solution Ask your operator to ensure that the V2 Firehose checkbox is enabled, and the Enable Log Cache syslog ingestion checkbox is disabled in the TAS for VMs tile. For more information about configuring these checkboxes, see Enable Syslog Forwarding.


When Using Service-Gateway Access, create-service or update-service Fails

Symptom

When you run cf create-service or cf update-service with {"enable_external_access": true}, you receive an error like this:

Service broker error: contact your operator,
service configuration issue occurred
Cause

When off-platform access is set up for a foundation, a range of TCP ports is reserved for MySQL traffic. Each service instance for which service-gateway access is enabled requires one port.

If all the ports in the range have been assigned to other service instances, then you cannot create or update service instances to use service-gateway access.

Solution

To resolve this error, confirm that the problem is due to not enough ports and, if so, increase the port range:

  1. Review the BOSH logs on the MySQL service broker VM, and, in the broker.stdout.log file look for this error message: Failed to update manifest: There are no free ports in range: [… For information about how to download the service broker logs, see Access Broker Logs and VMs.
  2. Ask the operator to increase the external TCP port range for off-platform access by editing the Settings pane on the Tanzu SQL for VMs tile. For information about the Settings pane, see Enable Service-Gateway Access in Enabling Service-Gateway Access.

If Instances or Database are Inaccessible

You might experience the following in a leader-follower or Multi‑Site Replication topology, or during upgrades:


Temporary Outages

Symptom VMware Tanzu SQL with MySQL for VMs service instances can become temporarily inaccessible during upgrades and VM or network failures.
Solution For more information, see Service Interruptions.


Apps Cannot Write to the Database

Symptom You have a leader-follower or Multi‑Site Replication topology, and your apps can no longer write to the database.
Cause

If you have a leader-follower topology, the leader VM might be read-only. If you can no longer read to the database as well, your persistent disk might be full.

If you have a Multi‑Site Replication topology, your leader VM might be down.

Solution

If you have a leader-follower topology and the leader VM is read-only, for how to troubleshoot this problem, see Both Leader and Follower Instances Are Read-Only.

If your apps can no longer read to the database as well, your persistent disk might be full. For more information about troubleshooting inoperable apps, see Apps are Inoperable.

If you have a Multi‑Site Replication topology and your leader VM is down, to resolve this issue, you can trigger a failover to the follower VM. For more information about troubleshooting this problem, see Triggering Multi-Site Replication Failover and Switchover.


Apps Are Inoperable

Symptom Your apps become inoperable. Read, write, and cf CLI operations do not work.
Cause Your persistent disk might be full.
Solution Contact your operator to check if your persistent disk is full. For more information about troubleshooting this problem, see Persistent Disk is Full.


Apps Cannot Connect to the Database

Symptom Apps can fail to connect to the database.
Cause
  • When your app uses MySQL Connector/J v5.1.41 or earlier.
  • When your app uses mutual TLS (mTLS).
Solution


MySQL Connector/J v5.1.41 or Earlier

Symptom

Apps cannot connect to the database when TLS is enabled and the apps are using MySQL Connector/J v5.1.41 or earlier.

Cause

You see errors about certificates.

For example:

Caused by: javax.net.ssl.SSLHandshakeException: Received fatal alert: bad_certificate
  at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) ~[na:1.8.0_152]
Solution

If you cannot update the MySQL Connector/J, do the workaround in How to disable KeyManager and TrustManager in Container Security Provider Framework in the Javanbuildpack in the Support knowledge base.


Mutual TLS

Symptom Apps cannot connect to the database when TLS is enabled and your apps uses mTLS.
Cause You see network errors in your app logs.
Solution To resolve this issue disable mTLS in your apps.

Failed Backup or Restore with the adbr Plugin

If you get errors when working with the ApplicationDataBackupRestore (adbr) plugin for the Cloud Foundry Command Line Interface (cf CLI) tool, see:


“400” Error during Backup or Restore

Symptom

When running cf adbr backup or cf adbr restore, an error occurs.

For example:

$ cf adbr backup myDB
  Failed to backup service instance "myDB": failed due to server error, status code: 400
Cause The broker on the VM is not running or is in an unhealthy state.
Solution Verify the health of the broker VM and review the logs for the broker.


“500” Error during Backup or Restore

Symptom

When running cf adbr backup or cf adbr restore, an error occurs.

For example:

$ cf adbr backup myDB
  Failed to backup service instance "myDB": failed due to server error, status code: 500
Cause The service instance agent is not running or is in an unhealthy state.
Solution Verify the health of the service instance VM and review the logs for the service instance.


“502” Error during Backup or Restore

Symptom

When running cf adbr backup or cf adbr restore, an error occurs.

For example:

$ cf adbr backup myDB
  Failed to backup service instance "myDB": failed due to server error, status code: 502
Cause The VM is down, stopped, or in an unhealthy state.
Solution Verify the health of the broker VM and review the logs for the broker.


“Status: Restore failed” after adbr Restore

Symptom

When running cf adbr get-status after restoring to a service instance, adbr returns Restore failed.

For example:

$ cf adbr get-status myTargetDB
Getting status of service instance myTargetDB in org my-org / space system as admin...
[Thu Feb 25 22:33:58 UTC 2021] Status: Restore failed
Cause A possible cause is that the database on the new service instance is not empty. For more information, see The New Database Must Be Empty in Backing Up and Restoring VMware Tanzu SQL with MySQL for VMs.
Solution

To resolve the error and complete the restore:

  1. Determine if the database is empty by reviewing the log /var/vcap/sys/log/mysql-restore/mysql-restore.stderr.log on the service instance.
  2. If any GTIDs (global transaction identifiers) are printed in the logs, then the database is not empty.
    Delete the service instance and create a new service instance to restore the backup to.
  3. If the log does not contain any GTIDs, then the restore failed for another reason. Review other logs on the service instance and, if necessary, contact Support.

For general information about the adbr plugin, see Backing Up and Restoring VMware Tanzu SQL with MySQL for VMs.

Persistent Disk Usage Is Increasing

If you have set the optimize_for_short_words parameter to true and you are alerted that persistent disk usage is high, then you might need to optimize indexed tables.


Persistent Disk Usage Is Increasing

Symptom

You have set the optimize_for_short_words optional parameter to true and the persistent disk is filling up.

For information about the parameter, see Optimize for Short Words.
Cause Over time, data has been deleted from your database and the full-text index has become too large.
Solution Remove full-text entries for deleted or old records by following the instructions in Optimize for Short Words.

For information about monitoring disk usage, see Monitoring and KPIs.

Techniques for Troubleshooting

See the following sections for troubleshooting techniques when using the Cloud Foundry Command-Line Interface (cf CLI) to perform basic operations on a Tanzu SQL for VMs service instance.

Basic cf CLI operations include create, update, bind, unbind, and delete.

Understand a Cloud Foundry (CF) Error Message

Failed operations (create, update, bind, unbind, delete) result in an error message. You can retrieve the error message later by running the cf CLI command cf service INSTANCE-NAME.

$ cf service myservice

Service instance: myservice
Service: super-db
Bound apps:
Tags:
Plan: dedicated-vm
Description: Dedicated Instance
Documentation url:
Dashboard:

Last Operation
Status: create failed
Message: Instance provisioning failed: There was a problem completing your request.
     Please contact your operations team providing the following information:
     service: redis-acceptance,
     service-instance-guid: ae9e232c-0bd5-4684-af27-1b08b0c70089,
     broker-request-id: 63da3a35-24aa-4183-aec6-db8294506bac,
     task-id: 442,
     operation: create
Started: 2017-03-13T10:16:55Z
Updated: 2017-03-13T10:17:58Z

Use the information in the Message field to debug further. Provide this information to Support when filing a ticket.

The task-id field maps to the BOSH task ID. For more information on a failed BOSH task, use the bosh task TASK-ID.

The broker-request-guid maps to the portion of the On-Demand Broker log containing the failed step. Access the broker log through your syslog aggregator, or access BOSH logs for the broker by typing bosh logs broker 0. If you have more than one broker instance, repeat this process for each instance.

Find Information about Your Service Instance

You might need to find the name, GUID, or other information about a service instance. To find this information, do the following:

  1. Log into the space containing the instance or failed instance.

    $ cf login
    

  2. If you do not know the name of the service instance, run cf services to see a listing of all service instances in the space. The service instances are listed in the name column.

    $ cf services
    Getting services in org my-org / space my-space as user@example.com...
    OK
    name          service      plan        bound apps    last operation
    my-instance   p.mysql      db-small                  create succeeded
    

  3. To retrieve more information about a specific instance, run cf service SERVICE-INSTANCE-NAME

  4. To retrieve the GUID of the instance, run cf service SERVICE-INSTANCE-NAME --guid

The GUID is useful for debugging.

Use the Knowledge Base (Community)

Find the answer to your question and browse product discussions and solutions by searching the VMware Tanzu Knowledge Base.

File a Support Ticket

You can file a support ticket here. Be sure to provide the error message from cf service YOUR-SERVICE-INSTANCE.

To expedite troubleshooting, if possible, provide your service broker logs, service instance logs, and BOSH task output. Your cloud operator should be able to obtain these from your error message.