Troubleshooting MySQL for Pivotal Platform
- Troubleshoot Errors
- Troubleshoot Components
-
Techniques for Troubleshooting
- Parse a Cloud Foundry (CF) Error Message
- Access Broker and Instance Logs and VMs
- Run Service Broker Errands to Manage Brokers and Instances
- Detect Orphaned Service Instances
- Retrieve Admin and Read-Only Admin Credentials for a Service Instance
- Reinstall a Tile
- View Resource Saturation and Scaling
- Identify Apps using a Service Instance
- Monitor Quota Saturation and Service Instance Count
- Techniques for Troubleshooting Highly Available Clusters
- Force a Node to Rejoin a Highly Available Cluster Manually
- Re-create a Corrupted VM in a Highly Available Cluster
- Check Replication Status in a Highly Available Cluster
- Tools for Troubleshooting
- Knowledge Base (Community)
- File a Support Ticket
Page last updated:
This topic provides operators with basic instructions for troubleshooting on-demand MySQL for Pivotal Cloud Foundry (PCF). For information about temporary MySQL for PCF service interruptions, see Service Interruptions.
Troubleshoot Errors
This section provides information on how to troubleshoot specific errors or error messages.
Common Services Errors
The following errors occur in multiple services:
- Failed Installation
- Cannot Create or Delete Service Instances
- Broker Request Timeouts
- Instance Does Not Exist
- Cannot Bind to or Unbind from Service Instances
- Cannot Connect to a Service Instance
- Upgrade All Service Instances Errand Fails
- Missing Logs and Metrics
Failed Installation |
|
---|---|
Symptom | MySQL for PCF fails to install. |
Cause | Reasons for a failed installation include:
|
Solution | To troubleshoot:
|
Cannot Create or Delete Service Instances |
|
---|---|
Symptom | If developers report errors such as:
Instance provisioning failed: There was a problem completing your request. Please contact your operations team providing the following information: service: redis-acceptance, service-instance-guid: ae9e232c-0bd5-4684-af27-1b08b0c70089, broker-request-id: 63da3a35-24aa-4183-aec6-db8294506bac, task-id: 442, operation: create |
Cause | Reasons include:
|
Solution | To troubleshoot:
|
Broker Request Timeouts |
|
---|---|
Symptom | If developers report errors such as:
Server error, status code: 504, error code: 10001, message: The request to the service broker timed out: https://BROKER-URL/v2/service_instances/e34046d3-2379-40d0-a318-d54fc7a5b13f/service_bindings/aa635a3b-ef6d-41c3-a23f-55752f3f651b |
Cause | Cloud Foundry might not be connected to the service broker, or there might be a large number of queued tasks. |
Solution | To troubleshoot:
|
Instance Does Not Exist |
|
---|---|
Symptom | If developers report errors such as:
Server error, status code: 502, error code: 10001, message: Service broker error: instance does not exist` |
Cause | The instance might have been deleted. |
Solution | To troubleshoot:
If the BOSH deployment is not found, it has been deleted from BOSH. Contact Support for further assistance. |
Cannot Bind to or Unbind from Service Instances |
|
---|---|
Symptom | If developers report errors such as:
Server error, status code: 502, error code: 10001, message: Service broker error: There was a problem completing your request. Please contact your operations team providing the following information: service: example-service, service-instance-guid: 8d69de6c-88c6-4283-b8bc-1c46103714e2, broker-request-id: 15f4f87e-200a-4b1a-b76c-1c4b6597c2e1, operation: bind |
Cause | This might be due to authentication or network errors. |
Solution | To find out the exact issue with the binding process:
|
Note: Service instances can also become temporarily inaccessible during upgrades and VM or network failures. See Service Interruptions for more information.
Upgrade All Service Instances Errand Fails |
|
---|---|
Symptom | The upgrade-all-service-instances errand fails. |
Cause | There might be a problem with a particular instance. |
Solution | To troubleshoot:
|
Missing Logs and Metrics |
|
---|---|
Symptom | No logs are being emitted by the on-demand broker. |
Cause | Syslog might not be configured correctly, or you might have network access issues. |
Solution | To troubleshoot:
|
Leader-Follower Service Instance Errors
This section provides solutions for the following errands:
- Unable to Determine Leader and Follower
- Both Leader and Follower Instances Are Writable
- Both Leader and Follower Instances Are Read-Only
Unable to Determine Leader and Follower |
|
---|---|
Symptom | This problem happens when the configure-leader-follower
errand fails because it cannot determine the VM roles. The configure-leader-follower errand exits with 1
and the errand logs contain the following:
$ Unable to determine leader and follower based on transaction history. |
Cause | Something has happened to the instances, such as a failure or manual intervention. As a result, there is not enough information available to determine the correct state and topology without operator intervention to resolve the issue. |
Solution | Use the inspect errand to determine which instance should be the leader. Then, using the
orchestration
errands and backup/restore, you can put the service instance into a safe topology, and then rerun the
configure-leader-follower errand. This is shown in the example below.This example shows one outcome that the inspect errand can return:
|
Both Leader and Follower Instances Are Writable |
|
---|---|
Symptom | This problem happens when theconfigure-leader-follower errand fails because both VMs are writable and the VMs might hold differing data. The configure–leader-follower errand exits with 1
and the errand logs contain the following:
$ Both mysql instances are writable. Please ensure no divergent data and set one instance to read-only mode. |
Cause | MySQL for Pivotal Cloud Foundry tries to ensure that there is only one writable instance of the
leader-follower pair at any given time. However, in certain situations, such as
network partitions, or manual intervention outside of the provided bosh
errands, it is possible for both instances to be writable. The service instances remain in this state until an operator resolves the issue to ensure that the correct instance is promoted and reduce the potential for data divergence. |
Solution |
|
Both Leader and Follower Instances Are Read-Only |
|
---|---|
Symptom | Developers report that apps cannot write to the database. In a leader-follower topology, the leader VM is writable and the follower VM is read-only. However if both VMs are read only, apps cannot write to the database. |
Cause | This problem happens if the leader VM fails and the BOSH Resurrector is enabled. When the leader is resurrected, it is set as read-only. |
Solution |
|
Inoperable App and Database Errors
This section provides a solution for the following errors:
Persistent Disk is Full |
|
---|---|
Symptom | Developers report that read, write, and cf CLI operations do not work.
Developers cannot upgrade to a larger MySQL for Pivotal Cloud Foundry service plan to free up disk space. If your persistent disk is full, apps become inoperable. In this state, read, write, and Cloud Foundry Command-Line Interface (cf CLI) operations do not work. |
Cause | This problem happens if your persistent disk is full.
When you use the BOSH CLI to target your deployment, you see that instances are at 100% persistent disk usage. Available disk space can be increased by deleting log files. After deleting logs, you can then upgrade to a larger MySQL for Pivotal Cloud Foundry service plan. You can also turn off binary logging before developers do large data uploads or if their databases have a high transaction volume. |
Solution |
To resolve this issue, do one of the following:
|
Cannot Access Database Table |
|
---|---|
Symptom | When you query an existing table, you see an error similar to
the following:
ERROR 1146 (42S02): Table 'mysql.foobar' doesn't exist |
Cause | This error occurs if you created an uppercase table name and then enabled lowercase table names. You enable lowercase table names either by:
|
Solution | To resolve this issue:
|
Highly Available Cluster Errors
This section provides solutions for the following errands:
- Unresponsive Node in a Highly Available Cluster
- Many Replication Errors in Logs for Highly Available Clusters
Unresponsive Node in a Highly Available Cluster |
|
---|---|
Symptom |
A client connected to a MySQL for Pivotal Cloud Foundry cluster node reports the following error:
WSREP has not yet prepared this node for application useSome clients might instead return the following: unknown error |
Cause | If the client is connected to a MySQL for Pivotal Cloud Foundry cluster node and that node loses connection to the rest of the cluster, the node stops accepting writes. If the connection to this node is made through the proxy, the proxy automatically re-routes further connections to a different node. |
Solution | A node can become unresponsive for a number of reasons. For solutions, see the following:
|
Many Replication Errors in Logs for Highly Available Clusters |
|||||||
---|---|---|---|---|---|---|---|
Symptom | You see many replication errors in the MySQL logs, like the following:
160318 9:25:16 [Warning] WSREP: RBR event 1 Query apply warning: 1, 16992456 160318 9:25:16 [Warning] WSREP: Ignoring error for TO isolated action: source: abcd1234-abcd-1234-abcd-1234abcd1234 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 246804 trx_id: -1 seqnos (l: 865022, g: 16992456, s: 16992455, d: 16992455, ts: 2530660989030983) 160318 9:25:16 [ERROR] Slave SQL: Error 'Duplicate column name 'number'' on query. Default database: 'cf_0123456_1234_abcd_1234_abcd1234abcd'. Query: 'ALTER TABLE ...' |
||||||
Cause | This problem happens when there are errors in SQL statements. | ||||||
Solution | For solutions for replication errors in MySQL log files, see the table below.
ALTER TABLE
or persistent disk or memory issues, you can ignore the replication errors.
|
Troubleshoot Components
This section provides guidance on checking for and fixing issues in on-demand service components.
BOSH Problems
Large BOSH Queue
On-demand service brokers add tasks to the BOSH request queue, which can back up
and cause delay under heavy loads.
An app developer who requests a new MySQL for PCF instance sees
create in progress
in the Cloud Foundry Command Line Interface (cf CLI) until
BOSH processes the queued request.
Ops Manager currently deploys two BOSH workers to process its queue. Future versions of Ops Manager will let users configure the number of BOSH workers.
Configuration
Service Instances in Failing State
The VM or Disk type that you configured in the plan page of the tile in Ops Manager might not be large enough for the MySQL for PCF service instance to start. See tile-specific guidance on resource requirements.
Authentication
UAA Changes
If you have rotated any UAA user credentials then you may see authentication issues in the service broker logs.
To resolve this, redeploy the MySQL for PCF tile in Ops Manager. This provides the broker with the latest configuration.
Note: You must ensure that any changes to UAA
credentials are reflected in the Ops Manager credentials
tab of the Pivotal Application Service tile.
Networking
Common issues with networking include:
Issue | Solution |
---|---|
Latency when connecting to the MySQL for PCF service instance to create or delete a binding. | Try again or improve network performance. |
Firewall rules are blocking connections from the MySQL for PCF service broker to the service instance. | Open the MySQL for PCF tile in Ops Manager and check the two networks configured in the Networks pane. Ensure that these networks allow access to each other. |
Firewall rules are blocking connections from the service network to the BOSH director network. | Ensure that service instances can access the Director so that the BOSH agents can report in. |
Apps cannot access the service network. | Configure Cloud Foundry application security groups to allow runtime access to the service network. |
Problems accessing BOSH’s UAA or the BOSH director. | Follow network troubleshooting and check that the BOSH director is online |
Validate Service Broker Connectivity to Service Instances
To validate connectivity, do the following:
-
View the BOSH deployment name for your service broker by running:
bosh deployments
-
SSH into the MySQL for PCF service broker by running:
bosh -d DEPLOYMENT-NAME ssh
-
If no BOSH
task-id
appears in the error message, look in the broker log using thebroker-request-id
from the task.
Validate App Access to Service Instance
Use cf ssh
to access to the app container, then try connecting to
the MySQL for PCF service instance using the binding included in the
VCAP_SERVICES
environment variable.
Quotas
Plan Quota Issues
If developers report errors such as:
Message: Service broker error: The quota for this service plan has been exceeded. Please contact your Operator for help.
- Check your current plan quota.
- Increase the plan quota.
- Log in to Ops Manager.
- Reconfigure the quota on the plan page.
- Deploy the tile.
- Find who is using the plan quota and take the appropriate action.
Global Quota Issues
If developers report errors such as:
Message: Service broker error: The quota for this service has been exceeded. Please contact your Operator for help.
- Check your current global quota.
- Increase the global quota.
- Log in to Ops Manager.
- Reconfigure the quota on the on-demand settings page.
- Deploy the tile.
- Find out who is using the quota and take the appropriate action.
Failing Jobs and Unhealthy Instances
To determine whether there is an issue with the MySQL for PCF deployment:
-
Inspect the VMs by running:
bosh -d service-instance_GUID vms --vitals
-
For additional information, run:
bosh -d service-instance_GUID instances --ps --vitals
If the VM is failing, follow the service-specific information.
Any unadvised corrective actions (such as running BOSH restart
on
a VM) can cause issues in the service instance.
A failing process or failing VM might come back automatically after a temporary service outage. See VM Process Failure and VM Failure.
AZ or Region Failure
Failures at the IaaS level, such as Availability Zone (AZ) or region failures, can interrupt service and require manual restoration. See AZ Failure and Region Failure.
Techniques for Troubleshooting
Instructions on interacting with the on-demand service broker and on-demand service instance BOSH deployments, and on performing general maintenance and housekeeping tasks
Parse a Cloud Foundry (CF) Error Message
Failed operations (create, update, bind, unbind, delete) result in an error message.
You can retrieve the error message later by running the cf CLI command cf service INSTANCE-NAME
.
$ cf service myservice Service instance: myservice Service: super-db Bound apps: Tags: Plan: dedicated-vm Description: Dedicated Instance Documentation url: Dashboard: Last Operation Status: create failed Message: Instance provisioning failed: There was a problem completing your request. Please contact your operations team providing the following information: service: redis-acceptance, service-instance-guid: ae9e232c-0bd5-4684-af27-1b08b0c70089, broker-request-id: 63da3a35-24aa-4183-aec6-db8294506bac, task-id: 442, operation: create Started: 2017-03-13T10:16:55Z Updated: 2017-03-13T10:17:58Z
Use the information in the Message
field to debug further.
Provide this information to Support when filing a ticket.
The task-id
field maps to the BOSH task ID.
For more information on a failed BOSH task, use the bosh task TASK-ID
.
The broker-request-guid
maps to the portion of the On-Demand Broker log
containing the failed step.
Access the broker log through your syslog aggregator, or access BOSH logs for
the broker by typing bosh logs broker 0
.
If you have more than one broker instance, repeat this process for each instance.
Access Broker and Instance Logs and VMs
Before following the procedures below, log in to the cf CLI and the BOSH CLI.
Access Broker Logs and VMs
You can access logs using Ops Manager by clicking on the Logs tab in the tile and downloading the broker logs.
To access logs using the BOSH CLI, do the following:
-
Identify the on-demand broker (ODB) deployment by running the following command:
bosh deployments
-
View VMs in the deployment by running the following command:
bosh -d DEPLOYMENT-NAME instances
-
SSH onto the VM by running the following command:
bosh -d DEPLOYMENT-NAME ssh
-
Download the broker logs by running the following command:
bosh -d DEPLOYMENT-NAME logs
The archive generated by BOSH includes the following logs:
Log Name | Description |
---|---|
broker.stdout.log | Requests to the on-demand broker and the actions the broker performs while orchestrating the request (e.g. generating a manifest and calling BOSH). Start here when troubleshooting. |
bpm.log | Control script logs for starting and stopping the on-demand broker. |
post-start.stderr.log | Errors that occur during post-start verification. |
post-start.stdout.log | Post-start verification. |
drain.stderr.log | Errors that occur while running the drain script. |
Access Service Instance Logs and VMs
-
To target an individual service instance deployment, retrieve the GUID of your service instance with the following cf CLI command:
cf service MY-SERVICE --guid
-
To view VMs in the deployment, run the following command:
bosh -d service-instance_GUID instances
-
To SSH into a VM, run the following command:
bosh -d service-instance_GUID ssh
-
To download the instance logs, run the following command:
bosh -d service-instance_GUID logs
Run Service Broker Errands to Manage Brokers and Instances
From the BOSH CLI, you can run service broker errands that manage the service brokers and perform mass operations on the service instances that the brokers created. These service broker errands include:
-
register-broker
registers a broker with the Cloud Controller and lists it in the Marketplace. -
deregister-broker
deregisters a broker with the Cloud Controller and removes it from the Marketplace. -
upgrade-all-service-instances
upgrades existing instances of a service to its latest installed version. -
delete-all-service-instances
deletes all instances of service. -
orphan-deployments
detects “orphan” instances that are running on BOSH but not registered with the Cloud Controller.
To run an errand, run the following command:
bosh -d DEPLOYMENT-NAME run-errand ERRAND-NAME
For example:
bosh -d my-deployment run-errand deregister-broker
Register Broker
The register-broker
errand does the following:
- Registers the service broker with Cloud Controller.
- Enables service access for any plans that are enabled on the tile.
- Disables service access for any plans that are disabled on the tile.
- Does nothing for any plans that are set to manual on the tile.
You should run this errand whenever the broker is re-deployed with new catalog metadata to update the Marketplace.
Plans with disabled service access are only visible to admin Cloud Foundry users. Non-admin Cloud Foundry users, including Org Managers and Space Managers, cannot see these plans.
Deregister Broker
This errand deregisters a broker from Cloud Foundry.
The errand does the following:
- Deletes the service broker from Cloud Controller
- Fails if there are any service instances, with or without bindings
Use the Delete All Service Instances errand to delete any existing service instances.
To run the errand, run the following command:
bosh -d DEPLOYMENT-NAME run-errand deregister-broker
Upgrade All Service Instances
The upgrade-all-service-instances
errand does the following:
- Collects all of the service instances that the on-demand broker has registered.
- Issues an upgrade command and deploys the a new manifest to the on-demand broker for each service instance.
- Adds to a retry list any instances that have ongoing BOSH tasks at the time of upgrade.
- Retries any instances in the retry list until all instances are upgraded.
When you make changes to the plan configuration, the errand upgrades all the MySQL for PCF service instances to the latest version of the plan.
If any instance fails to upgrade, the errand fails immediately. This prevents systemic problems from spreading to the rest of your service instances.
Delete All Service Instances
This errand uses the Cloud Controller API to delete all instances of your broker’s service offering in every Cloud Foundry org and space. It only deletes instances the Cloud Controller knows about. It does not delete orphan BOSH deployments.
Note: Orphan BOSH deployments do not correspond to a known service instance.
While rare, orphan deployments can occur. Use the orphan-deployments
errand to identify them.
The delete-all-service-instances
errand does the following:
- Unbinds all apps from the service instances.
-
Deletes all service instances sequentially. Each service instance deletion includes:
- Running any pre-delete errands
- Deleting the BOSH deployment of the service instance
- Removing any ODB-managed secrets from BOSH CredHub
- Checking for instance deletion failure, which results in the errand failing immediately
- Determines whether any instances have been created while the errand was running. If new instances are detected, the errand returns an error. In this case, VMware recommends running the errand again.
Warning: Use extreme caution when running this errand. You should only use it when you want to totally destroy all of the on-demand service instances in an environment.
To run the errand, run the following command:
bosh -d service-instance_GUID delete-deployment
Detect Orphaned Service Instances
A service instance is defined as “orphaned” when the BOSH deployment for the instance is still running, but the service is no longer registered in Cloud Foundry.
The orphan-deployments
errand collates a list of service deployments that have
no matching service instances in Cloud Foundry and return the list to the operator.
It is then up to the operator to remove the orphaned BOSH deployments.
To run the errand, run the following command:
bosh -d DEPLOYMENT-NAME run-errand orphan-deployments
If orphan deployments exist—The errand script does the following:
- Exit with exit code 10
- Output a list of deployment names under a
[stdout]
header - Provide a detailed error message under a
[stderr]
header
For example:
[stdout] [{"deployment\_name":"service-instance\_80e3c5a7-80be-49f0-8512-44840f3c4d1b"}] [stderr] Orphan BOSH deployments detected with no corresponding service instance in Cloud Foundry. Before deleting any deployment it is recommended to verify the service instance no longer exists in Cloud Foundry and any data is safe to delete. Errand 'orphan-deployments' completed with error (exit code 10)
These details will also be available through the BOSH /tasks/
API endpoint for use in scripting:
$ curl 'https://bosh-user:bosh-password@bosh-url:25555/tasks/task-id/output?type=result' | jq .
{
"exit_code": 10,
"stdout": "[{"deployment_name":"service-instance_80e3c5a7-80be-49f0-8512-44840f3c4d1b"}]\n",
"stderr": "Orphan BOSH deployments detected with no corresponding service instance in Cloud Foundry. Before deleting any deployment it is recommended to verify the service instance no longer exists in Cloud Foundry and any data is safe to delete.\n",
"logs": {
"blobstore_id": "d830c4bf-8086-4bc2-8c1d-54d3a3c6d88d"
}
}
If no orphan deployments exist—The errand script does the following:
- Exit with exit code 0
- Stdout will be an empty list of deployments
- Stderr will be
None
[stdout] [] [stderr] None Errand 'orphan-deployments' completed successfully (exit code 0)
If the errand encounters an error during running—The errand script does the following:
- Exit with exit 1
- Stdout will be empty
- Any error messages will be under stderr
To clean up orphaned instances, run the following command on each instance:
WARNING: Running this command may leave IaaS resources in an unusable state.
bosh delete-deployment service-instance_SERVICE-INSTANCE-GUID
Retrieve Admin and Read-Only Admin Credentials for a Service Instance
To retrieve the admin credentials for a service instance from BOSH CredHub:
-
Use the cf CLI to determine the GUID associated with the service instance
for which you want to retrieve credentials by running:
For example:cf service SERVICE-INSTANCE-NAME --guid
$ cf service my-service-instance --guid 12345678-90ab-cdef-1234-567890abcdef
If you do not know the name of the service instance, you can list service instances in the space withcf services
. - Follow the steps in Gather Credential and IP Address Information and Log In to the Ops Manager VM with SSH of Advanced Troubleshooting with the BOSH CLI to SSH into the Ops Manager VM.
- From the Ops Manager VM, log in to your BOSH Director with the BOSH CLI. See Authenticate with the BOSH Director VM in Advanced Troubleshooting with the BOSH CLI.
-
Find the values for
BOSH_CLIENT
andBOSH_CLIENT_SECRET
:- In the Ops Manager Installation Dashboard, click the BOSH Director tile.
- Click the Credentials tab.
- In the BOSH Director section, click the link to the BOSH Commandline Credentials .
- Record the values for
BOSH_CLIENT
andBOSH_CLIENT_SECRET
.
-
Set the API target of the CredHub CLI to your BOSH CredHub server by running:
Wherecredhub api https://BOSH-DIRECTOR-IP:8844 \ --ca-cert=/var/tempest/workspaces/default/root_ca_certificate
BOSH-DIRECTOR-IP
is the IP address of the BOSH Director VM.
For example:$ credhub api https://10.0.0.5:8844 \ --ca-cert=/var/tempest/workspaces/default/root_ca_certificate
-
Log in to CredHub by running:
credhub login \ --client-name=BOSH-CLIENT \ --client-secret=BOSH-CLIENT-SECRET
For example:
$ credhub login \ --client-name=credhub \ --client-secret=abcdefghijklm123456789
-
Use the CredHub CLI to retrieve the credentials by doing one of following :
-
Retrieve the password for the admin user by running:
In the output, the password appears undercredhub get -n /p-bosh/service-instance_GUID/admin_password
value
. Record the password.
For example:$ credhub get \ -n /p-bosh/service-instance_70d30bb6-7f30-441a-a87c-05a5e4afff26/admin_password
id: d6e5bd10-3b60-4a1a-9e01-c76da688b847 name: /p-bosh/service-instance_70d30bb6-7f30-441a-a87c-05a5e4afff26/admin_password type: password value: UMF2DXsqNPPlCNWMdVMcNv7RC3Wi10 version_created_at: 2018-04-02T23:16:09Z -
Retrieve the password for the read-only admin user by running:
In the output, the password appears undercredhub get -n /p-bosh/service-instance_GUID/read_only_admin_password
value
. Record the password.
-
Retrieve the password for the admin user by running:
- Record the IP of your service instance. See Connect Using an IP Address.
-
Connect to your database by doing one of following:
- Connect using a management tool. See Using Management Tools for MySQL for PCF.
- Connect directly from your workstation using the MySQL client by
running:
mysql -h IP-ADDRESS -u admin -P 3306 -p
When prompted for a password, enter the password you recorded.
Reinstall a Tile
To reinstall the MySQL for PCF tile, see Reinstalling MySQL for Pivotal Cloud Foundry version 2 and above in the Pivotal Support knowledge base.
View Resource Saturation and Scaling
To view usage statistics for any service, do the following:
-
Run the following command:
bosh -d DEPLOYMENT-NAME vms --vitals
-
To view process-level information, run:
bosh -d DEPLOYMENT-NAME instances --ps
Identify Apps using a Service Instance
To identify which apps are using a specific service instance from the name of the BOSH deployment:
- Take the deployment name and strip the
service-instance_
leaving you with the GUID. - Log in to CF as an admin.
-
Obtain a list of all service bindings by running the following:
cf curl /v2/service_instances/GUID/service_bindings
-
The output from the above curl gives you a list of
resources
, with each item referencing a service binding, which contains theAPP-URL
. To find the name, org, and space for the app, run the following:cf curl APP-URL
and record the app name underentity.name
.cf curl SPACE-URL
to obtain the space, using theentity.space_url
from the above curl. Record the space name underentity.name
.-
cf curl ORGANIZATION-URL
to obtain the org, using theentity.organization_url
from the above curl. Record the organization name underentity.name
.
Note: When running cf curl
ensure that you query
all pages, because the responses are limited to a certain number of bindings per page.
The default is 50.
To find the next page curl the value under next_url
.
Monitor Quota Saturation and Service Instance Count
Quota saturation and total number of service instances are available through ODB metrics emitted to Loggregator. The metric names are shown below:
Metric Name | Description |
---|---|
on-demand-broker/SERVICE-NAME-MARKETPLACE/quota_remaining |
global quota remaining for all instances across all plans |
on-demand-broker/SERVICE-NAME-MARKETPLACE/PLAN-NAME/quota_remaining |
quota remaining for a particular plan |
on-demand-broker/SERVICE-NAME-MARKETPLACE/total_instances |
total instances created across all plans |
on-demand-broker/SERVICE-NAME-MARKETPLACE/PLAN-NAME/total_instances |
total instances created for a given plan |
Note: Quota metrics are not emitted if no quota has been set.
Techniques for Troubleshooting Highly Available Clusters
If your cluster is experiencing downtime or in a degraded state, Pivotal recommends gathering information to diagnose the type of failure the cluster is experiencing with the following workflow:
- Consult solutions for common errors. See Highly Available Cluster Troubleshooting Errors above.
- Use
mysql-diag
to view a summary of the network, disk, and replication state of each cluster node. Depending on the output frommysql-diag
, you might recover your cluster with the following troubleshooting techniques:- To force a node to rejoin the cluster, see Force a Node to Rejoin a Highly Available Cluster Manually below.
- To re-create a corrupted VM, see Re-create a Corrupted VM in a Highly Available Cluster below.
- To check if replication is working, see Check Replication in a Highly Available Cluster below.
mysql-diag
, see Running mysql-diag. -
Run
download-logs
against each node in your MySQL for PCF cluster, proxies, and jumpbox VM. You must rundownload-logs
before attempting recovery because any failures in the recovery procedure can result in logs being lost or made inaccessible.
For more information, see the download-logs section below.
Note: Pivotal recommends that you use the
-X
flag to get the complete set of available logs. However, if your cluster processes a high volume of transactions, the complete set might be too large and you can omit this flag to fetch the essential set of logs. -
If you are uncertain about the recovery steps to take, submit a ticket through
Pivotal Support. When you submit a ticket provide the following information:
- mysql-diag output: A summary of the network, disk, and replication state. The Running mysql-diag topic explains how to run mysql-diag.
- download-logs logs: Logs from your MySQL for PCF cluster, proxies, and jumpbox VM. The download-logs section below explains how to run download-logs.
- Deployment environment: The environment that MySQL for PCF is running in such as Pivotal Application Service or a service tile.
- Version numbers: The versions of the installed Ops Manager, PAS, and MySQL for PCF.
Warning: Do not attempt to resolve cluster issues by reconfiguring the cluster, such as changing the number of nodes or networks. Only follow the diagnosis steps in this document. If you are unsure how to proceed, contact Pivotal Support.
Force a Node to Rejoin a Highly Available Cluster Manually
If a detached node fails to rejoin the cluster after a configured grace period, you can manually force the node to rejoin the cluster. This procedure removes all the data on the node, forces the node to join the cluster, and creates a new copy of the cluster data on the node.
Warning: If you manually force a node to rejoin the cluster, data stored on the local node is lost. Do not force nodes to rejoin the cluster if you want to preserve unsynchronized data. Only do this procedure with the assistance of Pivotal Support.
Before following this procedure, try to bootstrap the cluster. For more information, see
Bootstrapping.
To manually force a node to rejoin the cluster, do the following:
- SSH into the node by following the procedure in BOSH SSH.
-
Become root by running:
sudo su
-
Shut down the
mysqld
process on the node by running:monit stop galera-init
-
Remove the unsynchronized data on the node by running:
rm -rf /var/vcap/store/pxc-mysql
-
Prepare the node before restarting by running:
/var/vcap/jobs/pxc-mysql/bin/pre-start
- Restart the
mysqld
process by running:monit start galera-init
Re-create a Corrupted VM in a Highly Available Cluster
To re-create a corrupted VM:
- To log in to the BOSH Director VM by doing the following procedures:
- Gather the information needed to log in to the BOSH Director VM by doing the procedure in Gather Credential and IP Address Information.
- Log in to the Ops Manager VM by doing the procedure in Log in to the Ops Manager VM with SSH.
- Log in to the BOSH Director VM by doing the procedure in Log in to the BOSH Director VM.
-
Identify and re-create the unresponsive node with
bosh cloudcheck
, by doing the procedure in BOSH Cloudcheck and runRecreate VM using last known apply spec
.Warning: Recreating a node will clear its logs. Ensure the node is completely down before recreating it.
Warning: Only re-create one node. Do not re-create the entire cluster. If more than one node is down, contact Pivotal Support.
Check Replication Status in a Highly Available Cluster
If you see stale data in your cluster, you can check whether replication is functioning normally.
To check the replication status, do the following:
- To log in to the BOSH Director VM, do the following:
- Gather the information needed to log in to the BOSH Director VM by doing the procedure in Gather Credential and IP Address Information.
- Log in to the Ops Manager VM by doing the procedure in Log in to the Ops Manager VM with SSH.
-
Create a dummy database in the first node by running:
mysql -h FIRST-NODE-IP-ADDRESS \ -u YOUR-IDENTITY \ -p -e "create database verify_healthy;"
Where:-
FIRST-NODE-IP-ADDRESS
is the IP address of the first node you recorded in step 1. -
YOUR-IDENTITY
is the value ofidentity
that you recorded in step 1.
-
-
Create a dummy table in the dummy database by running:
mysql -h FIRST-NODE-IP-ADDRESS \ -u your-identity \ -p -D verify_healthy \ -e "create table dummy_table (id int not null primary key auto_increment, info text) \ engine='innodb';"
-
Insert data into the dummy table by running:
mysql -h FIRST-NODE-IP-ADDRESS \ -u YOUR-IDENTITY \ -p -D verify_healthy \ -e "insert into dummy_table(info) values ('dummy data'),('more dummy data'),('even more dummy data');"
-
Query the table and verify that the three rows of dummy data exist on the first node by running:
mysql -h FIRST-NODE-IP-ADDRESS \ -u YOUR-IDENTITY \ -p -D verify_healthy \ -e "select * from dummy_table;"
When prompted for a password, provide thepassword
value recorded in step 1.
The above command returns output similar to the following:+----+----------------------+ | id | info | +----+----------------------+ | 4 | dummy data | | 7 | more dummy data | | 10 | even more dummy data | +----+----------------------+
-
Verify that the other nodes contain the same dummy data
by doing the following for each of the remaining MySQL server IP addresses:
- Query the dummy table by running :
mysql -h NEXT-NODE-IP-ADDRESS \ -u YOUR-IDENTITY \ -p -D verify\_healthy \ -e "select * from dummy_table;"
When prompted for a password, provide thepassword
value recorded in step 1. -
Verify that the node contains the same three rows of dummy data as the other nodes
by running:
mysql -h NEXT-NODE-IP-ADDRESS \ -u YOUR-IDENTITY \ -p -D verify\_healthy \ -e "select * from dummy\_table;"
When prompted for a password, provide thepassword
value recorded in step - Verify that the above command returns output similar to the following:
+----+----------------------+ | id | info | +----+----------------------+ | 4 | dummy data | | 7 | more dummy data | | 10 | even more dummy data | +----+----------------------+
- Query the dummy table by running :
-
If each MySQL server instance does not return the same result, before proceeding
further or making any changes to your deployment, contact
Pivotal Support
If each MySQL server instance returns the same result, then you can safely proceed to scaling down your cluster to a single node.
Tools for Troubleshooting
The troubleshooting techniques described above use the following tools.
download-logs
download-logs
is a script that you can run from your Ops Manager VM to aggregate
logs from your MySQL for PCF cluster nodes, proxies, and, with highly available clusters,
the jumpbox VM.
To use the download-logs
script:
Download and unzip the
download-logs
script from MySQL for PCF on Pivotal Network.From the Ops Manager Installation Dashboard, navigate to BOSH Director > Credentials.
Click Link to Credential for the Bosh Commandline Credentials.
From the plaintext file that opens, record the values for the following:
BOSH_CLIENT
BOSH_CLIENT_SECRET
BOSH_CA_CERT
BOSH_ENVIRONMENT
From the BOSH CLI, view the name of the BOSH deployment for MySQL for PCF by running:
bosh deployments
Record the name of the BOSH deployment.
SSH into your Ops Manager VM by doing the procedures in Gather Credential and IP Address Information and SSH into Ops Manager.
File transfer or copy-paste the
download-logs
script to a working directory on the Ops Manager VM.Set local environment variables to the same BOSH variable values that you recorded earlier, including
BOSH_DEPLOYMENT
for the deployment name.
For example:$ BOSH_CLIENT=ops_manager \ BOSH_CLIENT_SECRET=a123bc-E_4Ke3fb-gImbl3xw4a7meW0rY BOSH_CA_CERT=/var/tempest/workspaces/default/root_ca_certificate \ BOSH_ENVIRONMENT=10.0.0.5 \ BOSH_DEPLOYMENT=pivotal-mysql-14c4
Run the
download-logs
script by running:./download-logs -o .
The script saves a compressed file of logs combined from all MySQL for PCF VMs. The filename has the form
TIMESTAMP-mysql-logs.tar.gz.gpg
.
mysql-diag
The mysql-diag
tools outputs the current status of a highly available (HA) MySQL for PCF cluster
in PCF and suggests recovery actions if the cluster fails.
For more information, see Running mysql-diag.
Knowledge Base (Community)
Find the answer to your question and browse product discussions and solutions by searching the VMware Tanzu Knowledge Base.
File a Support Ticket
You can file a ticket with Support.
Be sure to provide the error message from cf service YOUR-SERVICE-INSTANCE
.
To expedite troubleshooting, provide your service broker logs and your service instance logs.
If your cf service YOUR-SERVICE-INSTANCE
output includes a
task-id
, provide the BOSH task output.