Troubleshooting MySQL for PCF
-
Troubleshooting Errors
- Deprecated Service Bindings Found (Upgrade Error)
- Failed Install
- Cannot Create or Delete Service Instances
- Broker Request Timeouts
- Cannot Bind to or Unbind from Service Instances
- Cannot Connect to a Service Instance
- Upgrade All Instances Fails
- Missing Logs and Metrics
- Unable to Determine Leader and Follower (Errand Error)
- Both Leader and Follower Instances Are Writable (Errand Error)
- Both Leader and Follower Instances Are Read-Only
- Persistent Disk is Full
- Unresponsive Node in a Highly Available Cluster
- Many Replication Errors in Logs for Highly Available Clusters
- Troubleshooting Components
-
Techniques for Troubleshooting
- Parse a Cloud Foundry (CF) Error Message
- Access Broker and Instance Logs and VMs
- Retrieve Admin and Read-Only Admin Credentials for a Service Instance
- Reinstall a Tile
- View Resource Saturation and Scaling
- Identify Service Instance Owner
- Monitor Quota Saturation and Service Instance Count
- Techniques for Troubleshooting Highly Available Clusters
- Force a Node to Rejoin a Highly Available Cluster Manually
- Re-create a Corrupted VM in a Highly Available Cluster
- Check Replication Status in a Highly Available Cluster
- Tools for Troubleshooting
- Knowledge Base (Community)
- File a Support Ticket
Warning: MySQL for PCF v2.5 is no longer supported because it has reached the End of General Support (EOGS) phase as defined by the Support Lifecycle Policy. To stay up to date with the latest software and security updates, upgrade to a supported version.
Page last updated:
This topic provides operators with basic instructions for troubleshooting on-demand MySQL for PCF. For information about temporary MySQL for PCF service interruptions, see Service Interruptions.
Troubleshooting Errors
This section provides information about how to troubleshooting specific errors or error messages.
Deprecated Service Bindings Found (Upgrade Error)
This problem happens when upgrading to MySQL for PCF v2.5 if there are service bindings that are still using IP addresses.
Symptom
You see in the Ops Manager installation pane that the upgrade process fails at the following step:
The error output in the logs contains a table of bindings or service keys which must be updated.
stdout +---------------------------+--------------------------------------+------------------------+--------------------------+--------------------+-------------------+-----------------------------+ | service | service guid | org | space | app or service key | type | reason | +---------------------------+--------------------------------------+------------------------+--------------------------+--------------------+-------------------+-----------------------------+ | upgrade-outdated-instance | 34f26746-fb46-4f14-87bc-e1ddce26f340 | mysql-org-upgrade-test | mysql-space-upgrade-test | cs-accept | appbinding | no dns: hostname="10.0.8.5" | +---------------------------+--------------------------------------+------------------------+--------------------------+--------------------+-------------------+-----------------------------+ | tlsdb | a999db0b-176e-4ac8-8342-d72b338d1f0c | mysql-org-upgrade-test | mysql-space-upgrade-test | cs-accept-tls | appbinding | no dns: hostname="10.0.8.6" | +---------------------------+--------------------------------------+------------------------+--------------------------+--------------------+-------------------+-----------------------------+ | tlsdb | a999db0b-176e-4ac8-8342-d72b338d1f0c | mysql-org-upgrade-test | mysql-space-upgrade-test | test-cli | servicekeybinding | no dns: hostname="10.0.8.6" | +---------------------------+--------------------------------------+------------------------+--------------------------+--------------------+-------------------+-----------------------------+ errand 'validate-before-upgrade-instances' completed with error (exit code 1) exit code 1 stderr 2018/12/18 02:51:51 deprecated service instances found, try upgrade it manually before upgrading tile
Explanation
In MySQL for PCF v2.5, if you run
upgrade-all-service-instances
while there are service bindings that use IP addresses, app downtime can occur. This is because bindings are still directing apps to connect with IP addresses, but the TLS certificate is signed with a DNS hostname. This prevents apps that validate
the server hostname in the TLS certificate from establishing a TLS connection to the database.
In MySQL for PCF v2.5, the
validate-before-upgrade-instances
errand checks that there are no service bindings or service keys using IP addresses, and, if there are, outputs the above error and prevents the
upgrade-all-service-instances
errand from running.
For general information about errands, including the
upgrade-all-service-instances
errand, see
MySQL for PCF Errands.
Solution
To solve this issue, do the following:
- Do the procedures in
Remove IP-Based Bindings.
After you restart all of your apps, your apps can connect to the database and resume service. However, your Ops Manager installation is still in a failed state. - To finish the installation, do the following:
- In the Ops Manager Installation Dashboard, click Review Pending Changes.
- In the Review Pending Changes pane, ensure that the Validate no IP-based bindings in use before upgrade-all-service-instances and Upgrade all On-demand MySQL Service Instances errands are selected.
- After reviewing the pending changes, click Apply Changes to complete the installation.
Failed Install
Reasons for a failed MySQL for PCF installation include:
- Certificate issues: The on-demand broker (ODB) requires valid certificates. Ensure that your certificates are valid and generate new ones if necessary. To generate new certificates, contact Pivotal Support.
- Deploy fails: Deploys can fail for a variety of reasons. View the logs using Ops Manager to determine why the deploy is failing.
- Networking problems:
- Cloud Foundry cannot reach the MySQL for PCF service broker
- Cloud Foundry cannot reach the service instances
- The service network cannot access the BOSH director
- The Register broker errand fails.
- The smoke test errand fails.
- Resource sizing issues: These occur when the resource sizes selected for a given plan are less than the MySQL for PCF service requires to function. Check your resource configuration in Ops Manager and ensure that the configuration matches that recommended by the service.
- Other service-specific issues.
Cannot Create or Delete Service Instances
If developers report errors such as:
Instance provisioning failed: There was a problem completing your request. Please contact your operations team providing the following information: service: redis-acceptance, service-instance-guid: ae9e232c-0bd5-4684-af27-1b08b0c70089, broker-request-id: 63da3a35-24aa-4183-aec6-db8294506bac, task-id: 442, operation: create
Follow these steps:
-
If the BOSH error shows a problem with the deployment manifest, open the manifest in a text editor to inspect it.
-
To continue troubleshooting, Log in to BOSH and target the MySQL for PCF service instance using the instructions on parsing a Cloud Foundry error message.
-
Retrieve the BOSH task ID from the error message and run the following command:
bosh task TASK-ID
-
If you need more information, access the broker logs and use the
broker-request-id
from the error message above to search the logs for more information. Check for:
Broker Request Timeouts
If developers report errors such as:
Server error, status code: 504, error code: 10001, message: The request to the service broker timed out: https://BROKER-URL/v2/service_instances/e34046d3-2379-40d0-a318-d54fc7a5b13f/service_bindings/aa635a3b-ef6d-41c3-a23f-55752f3f651b
Follow these steps:
- Confirm that Cloud Foundry (CF) is connected to the service broker.
-
Check the BOSH queue size:
- Log into BOSH as an admin.
- Run
bosh tasks
.
- If there are a large number of queued tasks, the system may be under too much load. BOSH is configured with two workers and one status worker, which may not be sufficient resources for the level of load. Advise app developers to try again once the system is under less load.
Cannot Bind to or Unbind from Service Instances
Instance Does Not Exist
If developers report errors such as:
Server error, status code: 502, error code: 10001, message: Service broker error: instance does not exist`
Follow these steps:
-
Confirm that the MySQL for PCF service instance exists in BOSH and obtain the GUID CF by running:
cf service MY-INSTANCE --guid
-
Using the GUID obtained above, the following BOSH CLI command:
bosh -d service-instance_GUID vms
If the BOSH deployment is not found, it has been deleted from BOSH. Contact Pivotal support for further assistance.
Other Errors
If developers report errors such as:
Server error, status code: 502, error code: 10001, message: Service broker error: There was a problem completing your request. Please contact your operations team providing the following information: service: example-service, service-instance-guid: 8d69de6c-88c6-4283-b8bc-1c46103714e2, broker-request-id: 15f4f87e-200a-4b1a-b76c-1c4b6597c2e1, operation: bind
To find out the exact issue with the binding process:
-
Search the logs for the
broker-request-id
string listed in the error message above. -
Contact Pivotal support for further assistance if you are unable to resolve the problem.
-
Check for:
Cannot Connect to a Service Instance
If developers report that their app cannot use service instances that they have successfully created and bound:
Ask the user to send application logs that show the connection error. If the error is originating from the service, then follow MySQL for PCF-specific instructions. If the issue appears to be network-related, then:
-
Check that application security groups are configured correctly. Access should be configured for the service network that the tile is deployed to.
-
Ensure that the network the Pivotal Application Service tile is deployed to has network access to the service network. You can find the network definition for this service network in the BOSH Director tile.
-
In Ops Manager go into the service tile and see the service network that is configured in the networks tab.
-
In Ops Manager go into the tile and see the network it is assigned to. Make sure that these networks can access each other.
Service instances can also become temporarily inaccessible during upgrades and VM or network failures. See Service Interruptions for more information.
Upgrade All Instances Fails
If the upgrade-all-service-instances
errand fails,
look at the errand output in the Ops Manager log.
If an instance fails to upgrade, debug and fix it before running the errand again to prevent any failure issues from spreading to other on-demand instances.
Once the Ops Manager log no longer lists the deployment as failing
,
re-run the errand to upgrade the rest of the instances.
Missing Logs and Metrics
If no logs are being emitted by the on-demand broker, check that your syslog forwarding address is correct in Ops Manager.
-
Ensure you have configured syslog for the tile.
-
Ensure that you have network connectivity between the networks that the tile is using and the syslog destination. If the destination is external, you need to use the public ip VM extension feature available in your Ops Manager tile configuration settings.
-
Verify that Loggregator is emitting metrics:
-
Install the
cf log-stream
plugin. For instructions, see the Log Stream CLI Plugin GitHub repository. -
Find the GUID for your service instance by running:
cf service SERVICE-INSTANCE --guid
-
Find logs from your service instance by running:
cf log-stream | grep "SERVICE-GUID"
-
If no metrics appear within five minutes, verify that the broker network has access to the Loggregator system on all required ports.
Contact Pivotal support if you are unable to resolve the issue.
Unable to Determine Leader and Follower (Errand Error)
This problem happens when the configure-leader-follower
errand fails because it cannot determine the VM roles.
Symptom
The configure-leader-follower
errand exits with 1
and the errand logs contain the following:
$ Unable to determine leader and follower based on transaction history.
Explanation
Something has happened to the instances, such as a failure or manual intervention. As a result, there is not enough information available to determine the correct state and topology without operator intervention to resolve the issue.
Solution
Use the inspect
errand to determine which instance should be the leader. Then, using the
orchestration
errands and backup/restore, you can put the service instance into a safe topology, and then rerun the
configure-leader-follower
errand. This is shown in the example below.
This example shows one outcome that the inspect
errand can return:
- Use the
inspect
errand to retrieve relevant information about the two VMs:$ bosh -e my-env -d my-dep run-errand inspect [...] Instance mysql/4ecad54b-0704-47eb-8eef-eb228cab9724 Exit Code 0 Stdout - Stderr 2017/12/11 18:25:54 Started executing command: inspect 2017/12/11 18:25:54 Started GET https://127.0.0.1:8443/status 2017/12/11 18:25:54 Has Data: false Read Only: true GTID Executed: 1d774323-de9e-11e7-be01-42010a001014:1-25 Replication Configured: false
In the above scenario, the first instance is missing data but does not have replication configured. The second instance has data, and also has replication configured. The instructions below resolve this by copying data to the first instance, and resuming replication.
Instance mysql/e0b94ade-0114-4d49-a929-ce1616d8beda Exit Code 0 Stdout - Stderr 2017/12/11 18:25:54 Started executing command: inspect 2017/12/11 18:25:54 Started GET https://127.0.0.1:8443/status 2017/12/11 18:25:54 Has Data: true Read Only: true GTID Executed: 1d774323-de9e-11e7-be01-42010a001014:1-25 Replication Configured: true
2 errand(s)
Succeeded - Take a backup of the second instance using the Manual Backup steps.
- Restore the backup artifact to the first instance using the
Manual Restore
steps.
At this point, the instances have equivalent data. -
Run the
configure-leader-follower
errand to reconfigure replication:
For example:bosh -e ENVIRONMENT -d DEPLOYMENT run-errand configure-leader-follower --instance=mysql/GUID-OF-LEADER
$ bosh -e my-env -d my-dep \ run-errand configure-leader-follower \ --instance=mysql/4ecad54b-0704-47eb-8eef-eb228cab9724
Both Leader and Follower Instances Are Writable (Errand Error)
This problem happens when theconfigure-leader-follower
errand fails because both VMs are writable and the VMs might hold differing data.
Symptom
The configure–leader-follower
errand exits with 1
and the errand logs contain the following:
$ Both mysql instances are writable. Please ensure no divergent data and set one instance to read-only mode.
Explanation
MySQL for PCF tries to ensure that there is only one writable instance of the leader-follower pair at any given time. However, in certain situations, such as network partitions, or manual intervention outside of the provided bosh errands, it is possible for both instances to be writable. The service instances remain in this state until an operator resolves the issue to ensure that the correct instance is promoted and reduce the potential for data divergence.
Solution
-
Use the
inspect
errand to retrieve the GTID Executed set for each VM:$ bosh -e my-env -d my-dep run-errand inspect [...] Instance mysql/4ecad54b-0704-47eb-8eef-eb228cab9724 Exit Code 0 Stdout - Stderr 2017/12/11 18:25:54 Started executing command: inspect 2017/12/11 18:25:54 Started GET https:127.0.0.1:8443/status 2017/12/11 18:25:54 Has Data: true Read Only: false GTID Executed: 1d774323-de9e-11e7-be01-42010a001014:1-23 Replication Configured: false
If the GTID Executed sets for both instances are the same, continue to Step 2. If they are different, continue to Step 4.
Instance mysql/e0b94ade-0114-4d49-a929-ce1616d8beda Exit Code 0 Stdout - Stderr 2017/12/11 18:25:54 Started executing command: inspect 2017/12/11 18:25:54 Started GET https:127.0.0.1:8443/status 2017/12/11 18:25:54 Has Data: true Read Only: false GTID Executed: 1d774323-de9e-11e7-be01-42010a001014:1-25 Replication Configured: false
2 errand(s)
Succeeded - Look at the value of GTID Executed for both instances.
- If the range after the GUID is equivalent, either instance can be made read-only, as described in Step 3.
- If one instance has a range that is a subset of the other, the instance with the subset should must be made read-only, as described in Step 3.
- Based on the information you gathered in the step above, run the
make-read-only
errand to make the appropriate instance read-only:
For example:bosh -e ENVIRONMENT -d DEPLOYMENT run-errand make-read-only --instance=mysql/MYSQL-SUBSET-INSTANCE
$ bosh -e my-env -d my-dep \ run-errand make-read-only \ --instance=mysql/e0b94ade-0114-4d49-a929-ce1616d8beda [...] succeeded
- If the GTID Executed sets are neither equivalent nor subsets, data has diverged and you must determine what data has diverged as part of the procedure below:
- Use the
make-read-only
errand to set both instances to read-only to prevent further data divergence.
For example:bosh -e ENVIRONMENT -d DEPLOYMENT run-errand make-read-only --instance=mysql/MYSQL-INSTANCE
$ bosh -e my-env -d my-dep \ run-errand make-read-only \ --instance=mysql/e0b94ade-0114-4d49-a929-ce1616d8beda [...] succeeded
- Take a backup of both instances using the Manual Backup steps.
- Manually inspect the data on each instance to determine the discrepancies and put the data on the instance that is further ahead—this instance has the higher GTID Executed set, and will be the new leader.
- Migrate all appropriate data to the new leader instance.
- After putting all data on the leader, ssh onto the follower:
For example:bosh -e ENVIRONMENT -d DEPLOYMENT ssh mysql/GUID-OF-FOLLOWER
$ bosh -e my-env -d my-dep ssh mysql/e0b94ade-0114-4d49-a929-ce1616d8beda
- Become root with the command
sudo su
. - Stop the mysql process with the command
monit stop mysql
. - Delete the data directory of the follower with the command
rm -rf /var/vcap/store/mysql
. - Start the mysql process with the command
monit start mysql
. - Use the
configure-leader-follower
errand to copy the leader’s data to the follower and resume replication:
For example:bosh -e ENVIRONMENT -d DEPLOYMENT run-errand configure-leader-follower --instance=mysql/GUID-OF-LEADER
$ bosh -e my-env -d my-dep \ run-errand configure-leader-follower \ --instance=mysql/4ecad54b-0704-47eb-8eef-eb228cab9724
- Use the
Both Leader and Follower Instances Are Read-Only
In a leader-follower topology, the leader VM is writable and the follower VM is read-only. However if both VMs are read only, apps cannot write to the database.
Symptom
Developers report that apps cannot write to the database.
Explanation
This problem happens if the leader VM fails and the BOSH Resurrector is enabled. When the leader is resurrected, it is set as read-only.
Solution
-
Use the
inspect
errand to confirm that both VMs are in a read-only state:bosh -e ENVIRONMENT -d DEPLOYMENT run-errand inspect
- Examine the output and locate the information about the leader-follower MySQL VMs:
Instance mysql/4eexample54b-0704-47eb-8eef-eb2example724 Exit Code 0 Stdout - Stderr 2017/12/11 18:25:54 Started executing command: inspect 2017/12/11 18:25:54 Started GET https:999.0.0.1:8443/status 2017/12/11 18:25:54 Has Data: true Read Only: true GTID Executed: 1d779999-de9e-11e7-be01-42010a009999:1-23 Replication Configured: true
Instance mysql/e0exampleade-0114-4d49-a929-cexample8beda Exit Code 0 Stdout - Stderr 2017/12/11 18:25:54 Started executing command: inspect 2017/12/11 18:25:54 Started GET https:999.0.0.1:8443/status 2017/12/11 18:25:54 Has Data: true Read Only: true GTID Executed: 1d779999-de9e-11e7-be01-42010a009999:1-25 Replication Configured: false
2 errand(s)
Succeeded -
If Read Only is set to
true
for both VMs, make the leader writable using the following command:bosh -e ENVIRONMENT -d DEPLOYMENT run-errand configure-leader-follower --instance=mysql/GUID-OF-LEADER
For example, if the second instance above is the leader:$ bosh -e my-env -d my-dep \ run-errand configure-leader-follower \ --instance=mysql/e0exampleade-0114-4d49-a929-cexample8beda
Persistent Disk is Full
If your persistent disk is full, apps become inoperable. In this state, read, write, and Cloud Foundry Command-Line Interface (cf CLI) operations do not work.
Symptom
Developers report that read, write, and cf CLI operations do not work. Developers cannot upgrade to a larger MySQL service plan to free up disk space.
Explanation
This problem happens if your persistent disk is full.
When you use the BOSH CLI to target your deployment, you see that instances are at 100% persistent disk usage.
Available disk space can be increased by deleting log files.
After deleting logs, you can then upgrade to a larger MySQL service plan.
Solution
To resolve this issue, you must free up disk space by deleting binary logs.
For instructions for deleting binary logs, see MySQL for PCF hangs when server VM persistent disk is full in the Pivotal Support knowledge base.
Warning: Deleting binary logs is a destructive procedure and can result in MySQL data loss. Only do this procedure with the assistance of Pivotal Support.
Unresponsive Node in a Highly Available Cluster
Symptom
A client connected to a MySQL cluster node reports the following error:
WSREP has not yet prepared this node for application use
Some clients might instead return the following:
unknown error
Explanation
If the client is connected to a MySQL cluster node and that node loses connection to the rest of the cluster, the node stops accepting writes. If the connection to this node is made through the proxy, the proxy automatically
re-routes further connections to a different node.
Solutions
A node can become unresponsive for a number of reasons. For solutions, see the following:
- Network Latency: If network latency causes a node to become unresponsive, the node drops but eventually rejoins. The node automatically rejoins only if one node has left the cluster. Consult your IaaS network settings to reduce your network latency.
-
MySQL Process Failure: If the MySQL process crashes,
monit
and BOSH should restore the process. If the process is not restored, run thedownload-logs
tool and consult the error logs it generates. For more information, see the download-logs section below. -
Firewall Rule Change:
If your firewall rules change, it might prevent a node from communicating with the rest of the cluster. This causes the node to become unresponsive. In this case, the logs show the node leaving the cluster but do not show network latency errors.
To confirm that the node is unresponsive because of a firewall rule change, SSH from a responsive node to the unresponsive node. If you cannot connect, the node is unresponsive due to a firewall rule change. Change your firewall rules to enable the unresponsive node to rejoin the cluster. - VM Failure: If you cannot SSH into a node and you are not detecting either network latency or firewall issues, your node might be down due to VM failure. To confirm that the node is unresponsive and re-create the VM, see Re-create a Corrupted VM in a Highly Available below.
-
Node Unable to Rejoin: If a detached existing node fails to join the cluster, its
sequence_number
might be higher than those of the nodes with quorum. A highersequence_number
on the detached node indicates that it has recent changes to the data that the primary component lacks. You can check this by looking at the node’s error log at/var/vcap/sys/log/pxc-mysql/mysql.err.log
.
To restore the cluster, do one the following:- If the detached node has a higher sequence number than the primary component, do the procedures in Bootstrapping.
-
If bootstrapping does not restore the cluster, you can manually force
the node to rejoin the cluster. This removes all of the unsynchronized data
from the detached server node and creates a new copy of the cluster data on the node. For more
information, see Force a Node to Rejoin a Highly Available Cluster Manually below.
Warning: Forcing a node to rejoin the cluster is a destructive procedure. Only do this procedure with the assistance of Pivotal Support.
Many Replication Errors in Logs for Highly Available Clusters
This problem happens when there are errors in SQL statements.
Symptom
You see many replication errors in the MySQL logs, like the following:
160318 9:25:16 [Warning] WSREP: RBR event 1 Query apply warning: 1, 16992456 160318 9:25:16 [Warning] WSREP: Ignoring error for TO isolated action: source: abcd1234-abcd-1234-abcd-1234abcd1234 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 246804 trx_id: -1 seqnos (l: 865022, g: 16992456, s: 16992455, d: 16992455, ts: 2530660989030983) 160318 9:25:16 [ERROR] Slave SQL: Error 'Duplicate column name 'number'' on query. Default database: 'cf_0123456_1234_abcd_1234_abcd1234abcd'. Query: 'ALTER TABLE ...'
Explanation and Solution
For solutions for replication errors in MySQL log files, see the table below.
Additional Error | Solution |
---|---|
ALTER TABLE
errors |
Fix the
ALTER TABLE
error.This error can occur when an app issues an invalid data definition statement. Other nodes log this problem as a replication error because they fail to replicate the ALTER TABLE . |
Increased persistent disk usage or running out of working memory | Decode the
GRA_*.log
files and look for errors. See
How to decode Galera GRA log files
in the Pivotal Support knowledge base. The GRA log files contain failing DDL statements. |
If you see replication errors, but no ALTER TABLE
or persistent disk or memory issues, you can ignore the replication errors.
Troubleshooting Components
This section provides guidance on checking for and fixing issues in on-demand service components.
BOSH Problems
Large BOSH Queue
On-demand service brokers add tasks to the BOSH request queue, which can back up
and cause delay under heavy loads.
An app developer who requests a new MySQL for PCF instance sees
create in progress
in the Cloud Foundry Command Line Interface (cf CLI) until
BOSH processes the queued request.
Ops Manager currently deploys two BOSH workers to process its queue. Future versions of Ops Manager will let users configure the number of BOSH workers.
Configuration
Service instances in failing state
You may have configured a VM / Disk type in tile plan page in Ops Manager that is insufficiently large for the MySQL for PCF service instance to start. See tile-specific guidance on resource requirements.
Authentication
UAA Changes
If you have rotated any UAA user credentials then you may see authentication issues in the service broker logs.
To resolve this, redeploy the MySQL for PCF tile in Ops Manager. This provides the broker with the latest configuration.
Note: You must ensure that any changes to UAA
credentials are reflected in the Ops Manager credentials
tab of the Pivotal Application Service tile.
Networking
Common issues with networking include:
Issue | Solution |
---|---|
Latency when connecting to the MySQL for PCF service instance to create or delete a binding. | Try again or improve network performance. |
Firewall rules are blocking connections from the MySQL for PCF service broker to the service instance. | Open the MySQL for PCF tile in Ops Manager and check the two networks configured in the Networks pane. Ensure that these networks allow access to each other. |
Firewall rules are blocking connections from the service network to the BOSH director network. | Ensure that service instances can access the Director so that the BOSH agents can report in. |
Apps cannot access the service network. | Configure Cloud Foundry application security groups to allow runtime access to the service network. |
Problems accessing BOSH’s UAA or the BOSH director. | Follow network troubleshooting and check that the BOSH director is online |
Validate Service Broker Connectivity to Service Instances
To validate connectivity, do the following:
-
To SSH into the MySQL for PCF service broker, run the following command:
bosh -d service-instance_GUID ssh
-
If no BOSH
task-id
appears in the error message, look in the broker log using thebroker-request-id
from the task.
Validate App Access to Service Instance
Use cf ssh
to access to the app container, then try connecting to
the MySQL for PCF service instance using the binding included in the
VCAP_SERVICES
environment variable.
Quotas
Plan Quota Issues
If developers report errors such as:
Message: Service broker error: The quota for this service plan has been exceeded. Please contact your Operator for help.
- Check your current plan quota.
- Increase the plan quota.
- Log into Ops Manager.
- Reconfigure the quota on the plan page.
- Deploy the tile.
- Find who is using the plan quota and take the appropriate action.
Global Quota Issues
If developers report errors such as:
Message: Service broker error: The quota for this service has been exceeded. Please contact your Operator for help.
- Check your current global quota.
- Increase the global quota.
- Log into Ops Manager.
- Reconfigure the quota on the on-demand settings page.
- Deploy the tile.
- Find out who is using the quota and take the appropriate action.
Failing Jobs and Unhealthy Instances
To determine whether there is an issue with the MySQL for PCF service deployment, inspect the VMs. To do so, run the following command:
bosh -d service-instance_GUID vms --vitals
For additional information, run the following command:
bosh instances --ps --vitals
If the VM is failing, follow the service-specific information.
Any unadvised corrective actions (such as running BOSH restart
on
a VM) can cause issues in the service instance.
A failing process or failing VM might come back automatically after a temporary service outage. See VM Process Failure and VM Failure.
AZ or Region Failure
Failures at the IaaS level, such as Availability Zone (AZ) or region failures, can interrupt service and require manual restoration. See AZ Failure and Region Failure.
Techniques for Troubleshooting
Instructions on interacting with the on-demand service broker and on-demand service instance BOSH deployments, and on performing general maintenance and housekeeping tasks
Parse a Cloud Foundry (CF) Error Message
Failed operations (create, update, bind, unbind, delete) result in an error message.
You can retrieve the error message later by running the cf CLI command cf service INSTANCE-NAME
.
$ cf service myservice Service instance: myservice Service: super-db Bound apps: Tags: Plan: dedicated-vm Description: Dedicated Instance Documentation url: Dashboard: Last Operation Status: create failed Message: Instance provisioning failed: There was a problem completing your request. Please contact your operations team providing the following information: service: redis-acceptance, service-instance-guid: ae9e232c-0bd5-4684-af27-1b08b0c70089, broker-request-id: 63da3a35-24aa-4183-aec6-db8294506bac, task-id: 442, operation: create Started: 2017-03-13T10:16:55Z Updated: 2017-03-13T10:17:58Z
Use the information in the Message
field to debug further.
Provide this information to Pivotal Support when filing a ticket.
The task-id
field maps to the BOSH task ID.
For more information on a failed BOSH task, use the bosh task TASK-ID
.
The broker-request-guid
maps to the portion of the On-Demand Broker log
containing the failed step.
Access the broker log through your syslog aggregator, or access BOSH logs for
the broker by typing bosh logs broker 0
.
If you have more than one broker instance, repeat this process for each instance.
Access Broker and Instance Logs and VMs
Before following the procedures below, log into the cf CLI and the BOSH CLI.
Access Broker Logs and VM(s)
You can access logs using Ops Manager by clicking on the Logs tab in the tile and downloading the broker logs.
To access logs using the BOSH CLI, do the following:
-
Identify the on-demand broker (ODB) deployment by running the following command:
bosh deployments
-
View VMs in the deployment by running the following command:
bosh -d DEPLOYMENT-NAME instances
-
SSH onto the VM by running the following command:
bosh -d service-instance_GUID ssh
-
Download the broker logs by running the following command:
bosh -d service-instance_GUID logs
The archive generated by BOSH or Ops Manager includes the following logs:
Log Name | Description |
---|---|
broker.log | Requests to the on-demand broker and the actions the broker performs while orchestrating the request (e.g. generating a manifest and calling BOSH). Start here when troubleshooting. |
broker_ctl.log | Control script logs for starting and stopping the on-demand broker. |
post-start.stderr.log | Errors that occur during post-start verification. |
post-start.stdout.log | Post-start verification. |
drain.stderr.log | Errors that occur while running the drain script. |
Access Service Instance Logs and VMs
-
To target an individual service instance deployment, retrieve the GUID of your service instance with the following cf CLI command:
cf service MY-SERVICE --guid
-
To view VMs in the deployment, run the following command:
bosh -d DEPLOYMENT-NAME instances
-
To SSH into a VM, run the following command:
bosh -d service-instance_GUID ssh
-
To download the instance logs, run the following command:
bosh -d service-instance_GUID logs
Retrieve Admin and Read-Only Admin Credentials for a Service Instance
To retrieve the admin and read-only admin credentials for a service instance from BOSH CredHub, perform the following steps:
- Use the cf CLI to determine the GUID associated with the service instance
for which you want to retrieve credentials. Run the following command:
cf service service-instance --guid
WhereSERVICE-INSTANCE
is the name of the service instance.
For example:$ cf service my-service-instance --guid 12345678-90ab-cdef-1234-567890abcdef
If you do not know the name of the service instance, you can list service instances in the space withcf services
. - Perform the steps in Gather Credential and IP Address Information and SSH into Ops Manager of Advanced Troubleshooting with the BOSH CLI to SSH into the Ops Manager VM.
- From the Ops Manager VM, log in to your BOSH Director with the BOSH CLI. See Log in to the BOSH Director in Advanced Troubleshooting with the BOSH CLI.
-
Follow the steps in
Find the CredHub Credentials, and record the values for
identity
andpassword
. - Set the API target of the CredHub CLI to your BOSH CredHub server.
Run the following command:credhub api https://BOSH-DIRECTOR:8844 --ca-cert=/var/tempest/workspaces/default/root_ca_certificate
WhereBOSH-DIRECTOR
is the IP address of the BOSH Director VM.
For example:$ credhub api https://10.0.0.5:8844 --ca-cert=/var/tempest/workspaces/default/root_ca_certificate
-
Log in to CredHub.
Run the following command:
Where:credhub login \ --client-name=CREDHUB-CLIENT-NAME \ --client-secret=CREDHUB-CLIENT-SECRET
-
CREDHUB-CLIENT-NAME
is theidentity
you recorded in Find the CredHub Credentials. -
CREDHUB-CLIENT-SECRET
is thepassword
you recorded in Find the CredHub Credentials.
$ credhub login \ --client-name=credhub \ --client-secret=abcdefghijklm123456789
-
-
Use the CredHub CLI to retrieve the credentials.
-
To retrieve the password for the admin user, run the following command:
In the output, the password appears undercredhub get -n /p-bosh/service-instance_guid/admin_password
value
. -
To retrieve the password for the read-only admin user, run the following command:
credhub get -n /p-bosh/service-instance_guid/read_only_admin_password
In the output, the password appears undervalue
.
For example:$ credhub get -n /p-bosh/service-instance_70d30bb6-7f30-441a-a87c-05a5e4afff26/admin_password id: d6e5bd10-3b60-4a1a-9e01-c76da688b847 name: /p-bosh/service-instance_70d30bb6-7f30-441a-a87c-05a5e4afff26/admin_password type: password value: UMF2DXsqNPPlCNWMdVMcNv7RC3Wi10 version_created_at: 2018-04-02T23:16:09Z
-
To retrieve the password for the admin user, run the following command:
Reinstall a Tile
To reinstall the MySQL for PCF tile, see Reinstalling MySQL for Pivotal Cloud Foundry version 2 and above in the Pivotal Support knowledge base.
View Resource Saturation and Scaling
To view usage statistics for any service, do the following:
-
Run the following command:
bosh -d DEPLOYMENT-NAME vms --vitals
-
To view process-level information, run:
bosh -d DEPLOYMENT-NAME instances --ps
Identify Service Instance Owner
If you want to identify which apps are using a specific service instance from the BOSH deployments name, do the following:
- Take the deployment name and strip the
service-instance_
leaving you with the GUID. - Log in to CF as an admin.
-
Obtain a list of all service bindings by running the following:
cf curl /v2/service_instances/GUID/service_bindings
-
The output from the above curl gives you a list of
resources
, with each item referencing a service binding, which contains theAPP-URL
. To find the name, org, and space for the app, run the following:cf curl APP-URL
and record the app name underentity.name
.cf curl SPACE-URL
to obtain the space, using theentity.space_url
from the above curl. Record the space name underentity.name
.-
cf curl ORGANIZATION-URL
to obtain the org, using theentity.organization_url
from the above curl. Record the organization name underentity.name
.
Note: When running cf curl
ensure that you query
all pages, because the responses are limited to a certain number of bindings per page.
The default is 50.
To find the next page curl the value under next_url
.
Monitor Quota Saturation and Service Instance Count
Quota saturation and total number of service instances are available through ODB metrics emitted to Loggregator. The metric names are shown below:
Metric Name | Description |
---|---|
on-demand-broker/SERVICE-NAME-MARKETPLACE/quota_remaining |
global quota remaining for all instances across all plans |
on-demand-broker/SERVICE-NAME-MARKETPLACE/PLAN-NAME/quota_remaining |
quota remaining for a particular plan |
on-demand-broker/SERVICE-NAME-MARKETPLACE/total_instances |
total instances created across all plans |
on-demand-broker/SERVICE-NAME-MARKETPLACE/PLAN-NAME/total_instances |
total instances created for a given plan |
Note: Quota metrics are not emitted if no quota has been set.
Techniques for Troubleshooting Highly Available Clusters
If your cluster is experiencing downtime or in a degraded state, Pivotal recommends gathering information to diagnose the type of failure the cluster is experiencing with the following workflow:
- Consult solutions for common errors. See Highly Available Cluster Troubleshooting Errors above.
- Use
mysql-diag
to view a summary of the network, disk, and replication state of each cluster node. Depending on the output frommysql-diag
, you might recover your cluster with the following troubleshooting techniques:- To force a node to rejoin the cluster, see Force a Node to Rejoin a Highly Available Cluster Manually below.
- To re-create a corrupted VM, see Re-create a Corrupted VM in a Highly Available Cluster below.
- To check if replication is working, see Check Replication in a Highly Available Cluster below.
mysql-diag
, see Running mysql-diag. -
Run
download-logs
against each node in your MySQL cluster, MySQL proxies, and jumpbox VM. You must rundownload-logs
before attempting recovery because any failures in the recovery procedure can result in logs being lost or made inaccessible.
For more information, see the download-logs section below.
Note: Pivotal recommends that you use the
-X
flag to get the complete set of available logs. However, if your cluster processes a high volume of transactions, the complete set might be too large and you can omit this flag to fetch the essential set of logs. -
If you are uncertain about the recovery steps to take, submit a ticket through
Pivotal Support. When you submit a ticket provide the following information:
- mysql-diag output: A summary of the network, disk, and replication state. The Running mysql-diag topic explains how to run mysql-diag.
- download-logs logs: Logs from your MySQL cluster, MySQL proxies, and jumpbox VM. The download-logs section below explains how to run download-logs.
- Deployment environment: The environment that MySQL for PCF is running in such as Pivotal Application Service (PAS) or a service tile.
- Version numbers: The versions of the installed Ops Manager, PAS, and MySQL for PCF
Warning: Do not attempt to resolve cluster issues by reconfiguring the cluster, such as changing the number of nodes or networks. Only follow the diagnosis steps in this document. If you are unsure how to proceed, contact Pivotal Support.
Force a Node to Rejoin a Highly Available Cluster Manually
If a detached node fails to rejoin the cluster after a configured grace period, you can manually force the node to rejoin the cluster. This procedure removes all the data on the node, forces the node to join the cluster, and creates a new copy of the cluster data on the node.
Warning: If you manually force a node to rejoin the cluster, data stored on the local node is lost. Do not force nodes to rejoin the cluster if you want to preserve unsynchronized data. Only do this procedure with the assistance of Pivotal Support.
Before following this procedure, try to bootstrap the cluster. For more information, see
Bootstrapping.
To manually force a node to rejoin the cluster, do the following:
- To SSH into the node, follow the procedure in BOSH SSH.
-
To become root, run the following command:
sudo su
-
To shut down the
mysqld
process on the node, run the following command:monit stop galera-init
-
To remove the unsynchronized data on the node, run the following command:
rm -rf /var/vcap/store/pxc-mysql
-
To prepare the node before restarting, run the following command:
/var/vcap/jobs/pxc-mysql/bin/pre-start
- To restart the
mysqld
process, run the following command:monit start galera-init
Re-create a Corrupted VM in a Highly Available Cluster
To re-create a corrupted VM, do the following:
- To log in to the BOSH Director VM, do the following procedures:
- To gather the information needed to log in to the BOSH Director VM, see Gather Credential and IP Address Information.
- To log in to the Ops Manager VM, see Log in to the Ops Manager VM with SSH.
- To log in to the BOSH Director VM, see Log in to the BOSH Director VM.
-
To identify and re-create the unresponsive node with
bosh cloudcheck
, do the procedure in BOSH Cloudcheck and runRecreate VM using last known apply spec
.Warning: Recreating a node will clear its logs. Ensure the node is completely down before recreating it.
Warning: Only re-create one node. Do not re-create the entire cluster. If more than one node is down, contact Pivotal Support.
Check Replication Status in a Highly Available Cluster
If you see stale data in your cluster, you can check whether replication is functioning normally.
To check the replication status, do the following:
- To log in to the BOSH Director VM, do the following procedures:
- To gather the information needed to log in to the BOSH Director VM, see Gather Credential and IP Address Information.
- To log in to the Ops Manager VM, see Log in to the Ops Manager VM with SSH.
-
To create a dummy database in the first node, run the following command:
mysql -h FIRST-NODE-IP-ADDRESS -u YOUR-IDENTITY -p -e "create database verify_healthy;"
Where:-
FIRST-NODE-IP-ADDRESS
is the IP address of the first node you recorded in step 1. -
YOUR-IDENTITY
is the value ofidentity
that you recorded in step 1.
-
-
To create a dummy table in the dummy database, do the following:
mysql -h FIRST-NODE-IP-ADDRESS -u YOUR-IDENTITY -p -D verify_healthy -e "create table dummy_table (id int not null primary key auto_increment, info text) engine='innodb';"
-
To insert data into the dummy table, run the following command:
mysql -h FIRST-NODE-IP-ADDRESS -u YOUR-IDENTITY -p -D verify_healthy -e "insert into dummy_table(info) values ('dummy data'),('more dummy data'),('even more dummy data');"
-
To query the table and verify that the three rows of dummy data exist on the first node, run the following:
mysql -h FIRST-NODE-IP-ADDRESS -u YOUR-IDENTITY -p -D verify_healthy -e "select * from dummy_table;"
When prompted for a password, provide thepassword
value recorded in step 1.
The above command returns output similar to the following:+----+----------------------+ | id | info | +----+----------------------+ | 4 | dummy data | | 7 | more dummy data | | 10 | even more dummy data | +----+----------------------+
-
To verify that the other nodes contain the same dummy data do the following for each of the remaining MySQL server IP addresses:
- To query the dummy table, do the following:
mysql -h NEXT-NODE-IP-ADDRESS -u YOUR-IDENTITY -p -D verify\_healthy -e "select * from dummy_table;"
When prompted for a password, provide thepassword
value recorded in step 1. -
To verify that the node contains the same three rows of dummy data as the other nodes, run the following command:
mysql -h NEXT-NODE-IP-ADDRESS -u YOUR-IDENTITY -p -D verify\_healthy -e "select * from dummy\_table;"
When prompted for a password, provide thepassword
value recorded in step 1. Verify that the above command returns output similar to the following:+----+----------------------+ | id | info | +----+----------------------+ | 4 | dummy data | | 7 | more dummy data | | 10 | even more dummy data | +----+----------------------+
- To query the dummy table, do the following:
-
If each MySQL server instance does not return the same result, before proceeding
further or making any changes to your deployment, contact
Pivotal Support
If each MySQL server instance returns the same result, then you can safely proceed to scaling down your cluster to a single node.
Tools for Troubleshooting
The troubleshooting techniques described above use the following tools.
download-logs
download-logs
is a script that you can run from your Ops Manager VM to aggregate
logs from your MySQL cluster nodes, MySQL proxies, and, with highly available clusters, the jumpbox VM.
Do the following to run the download-logs
script:
Download and unpack the
download-logs
script from MySQL for PCF on Pivotal Network.From Ops Manager, open your BOSH Director tile > Credentials tab.
Click Bosh Commandline Credentials > Link to Credential. A short plaintext file opens.
From the plaintext file, record the values listed for the following:
BOSH_CLIENT
BOSH_CLIENT_SECRET
BOSH_CA_CERT
BOSH_ENVIRONMENT
From the BOSH CLI, run
bosh deployments
and record the name of the BOSH deployment that deployed MySQL for PCF.SSH into your Ops Manager VM. For information on how to do this, see Gather Credential and IP Address Information and SSH into Ops Manager.
File transfer or copy-paste the
download-logs
script to a working directory on the Ops Manager VM.Set local environment variables to the same BOSH variable values that you recorded earlier, including
BOSH_DEPLOYMENT
for the deployment name. For example:$ BOSH_CLIENT=ops_manager BOSH_CLIENT_SECRET=a123bc-E_4Ke3fb-gImbl3xw4a7meW0rY BOSH_CA_CERT=/var/tempest/workspaces/default/root_ca_certificate BOSH_ENVIRONMENT=10.0.0.5 BOSH_DEPLOYMENT=pivotal-mysql-14c4
Run the
download-logs
script, passing the output directory to the-o
flag:./download-logs -o .
The script saves a compressed file of logs combined from all MySQL VMs. The filename has the form
TIMESTAMP-mysql-logs.tar.gz.gpg
.
mysql-diag
mysql-diag
outputs the current status of a highly available (HA) MySQL cluster
in PCF and suggests recovery actions if the cluster fails.
For more information, see Running mysql-diag.
Knowledge Base (Community)
Find the answer to your question and browse product discussions and solutions by searching the Pivotal Knowledge Base.
File a Support Ticket
You can file a ticket with Pivotal Support.
Be sure to provide the error message from cf service YOUR-SERVICE-INSTANCE
.
To expedite troubleshooting, provide your service broker logs and your service instance logs.
If your cf service YOUR-SERVICE-INSTANCE
output includes a
task-id
, provide the BOSH task output.