Bootstrapping
Warning: MySQL for PCF v1.10 is no longer supported because it has reached the End of General Support (EOGS) phase as defined by the Support Lifecycle Policy. To stay up to date with the latest software and security updates, upgrade to a supported version.
This topic describes how to bootstrap your MySQL cluster in the event of a cluster failure.
Note: The examples in these instructions reflect a three-node MySQL for Pivotal Cloud Foundry (PCF) deployment. The process to bootstrap a two-node plus an arbitrator is identical, but the output will not match the examples.
When to Bootstrap
To determine whether you need to bootstrap your cluster, you must check whether the cluster has lost quorum. Bootstrapping is only required when the cluster has lost quorum. See Check Cluster State for more information about checking the state of your cluster.
Quorum is lost when less than half of the nodes can communicate with each other for longer than the configured grace period. In Galera terminology, if a node can communicate with the rest of the cluster, its database is in a good state, and it reports itself as synced
.
If quorum has not been lost, individual unhealthy nodes automatically rejoin the cluster once repaired, which means the error is resolved, the node is restarted, or connectivity is restored.
To check whether your cluster has lost quorum, look for the following symptoms:
- All nodes appear “Unhealthy” on the proxy dashboard, as in the following screenshot:
All responsive nodes report the value of
wsrep_cluster_status
asnon-Primary
in the MySQL client.mysql> SHOW STATUS LIKE 'wsrep_cluster_status'; +----------------------+-------------+ | Variable_name | Value | +----------------------+-------------+ | wsrep_cluster_status | non-Primary | +----------------------+-------------+
All unresponsive nodes respond with
ERROR 1047
when using most statement types in the MySQL client:mysql> select * from mysql.user; ERROR 1047 (08S01) at line 1: WSREP has not yet prepared node for application use
Prerequisites for Bootstrapping
Before running the bootstrapping procedures below, you must SSH into the Ops Manager VM and log in to the BOSH Director. For more information, see Prepare to Use the BOSH CLI.
Note: The version of the BOSH CLI you use depends on your version of Ops Manager. For Ops Manager v1.11 or later, use the BOSH CLI v2. For Ops Manager v1.10, use the BOSH CLI v1.
Bootstrap with the BOSH Errand
MySQL for PCF includes a BOSH errand to automate the process of bootstrapping. You must still manually initiate the bootstrap process, but using the errand reduces the number of manual steps necessary to complete the process.
The errand automates the manual bootstrapping procedure in the Bootstrap Manually section below. It finds the node with the highest transaction sequence number, and asks it to start up by itself (i.e. in bootstrap mode), and then asks the remaining nodes to join the cluster.
To bootstrap with the errand, follow the steps for one of the following scenarios:
- Scenario 1: Virtual Machines Running, Cluster Disrupted
- Scenario 2: Virtual Machines Terminated or Lost
Scenario 1: Virtual Machines Running, Cluster Disrupted
In this scenario, the nodes are up and running, but the cluster has been disrupted.
To bootstrap in this scenario, follow these steps:
To determine whether the cluster has been disrupted, use the BOSH CLI to list the jobs and see if they are
failing
:- Ops Manager v1.11 or later:
bosh2 -e YOUR-ENV instances
- Ops Manager v1.10:
bosh vms
The output resembles the following:
Instance Process State AZ IPs backup-prepare/c635410e-917d-46aa-b054-86d222b6d1c0 running us-central1-b 10.0.4.47 bootstrap/a31af4ff-e1df-4ff1-a781-abc3c6320ed4 - us-central1-b - broker-registrar/1a93e53d-af7c-4308-85d4-3b2b80d504e4 - us-central1-b 10.0.4.58 cf-mysql-broker/137d52b8-a1b0-41f3-847f-c44f51f87728 running us-central1-c 10.0.4.57 cf-mysql-broker/28b463b1-cc12-42bf-b34b-82ca7c417c41 running us-central1-b 10.0.4.56 deregister-and-purge-instances/4cb93432-4d90-4f1d-8152-d0c238fa5aab - us-central1-b - monitoring/f7117dcb-1c22-495e-a99e-cf2add90dea9 running us-central1-b 10.0.4.48 mysql/220fe72a-9026-4e2e-9fe3-1f5c0b6bf09b failing us-central1-b 10.0.4.44 mysql/28a210ac-cb98-4ab4-9672-9f4c661c57b8 failing us-central1-f 10.0.4.46 mysql/c1639373-26a2-44ce-85db-c9fe5a42964b failing us-central1-c 10.0.4.45 proxy/87c5683d-12f5-426c-b925-62521529f64a running us-central1-b 10.0.4.60 proxy/b0115ccd-7973-42d3-b6de-edb5ae53c63e running us-central1-c 10.0.4.61 rejoin-unsafe/8ce9370a-e86b-4638-bf76-e103f858413f - us-central1-b - smoke-tests/e026aaef-efd9-4644-8d14-0811cb1ba733 - us-central1-b 10.0.4.59
If the jobs are failing, do the following:
If you are using Ops Manager v1.0, run the following command to select the correct deployment:
bosh deployment PATH-TO-DEPLOYMENT-MANIFEST
Run the boostrap errand:
- Ops Manager v1.11 or later:
bosh2 -e YOUR-ENV -d YOUR-DEP run-errand bootstrap
- Ops Manager v1.10:
bosh run errand bootstrap
The command returns many lines of output, eventually followed by:
Bootstrap errand completed [stderr] + echo 'Started bootstrap errand ...' + JOB_DIR=/var/vcap/jobs/bootstrap + CONFIG_PATH=/var/vcap/jobs/bootstrap/config/config.yml + /var/vcap/packages/bootstrap/bin/cf-mysql-bootstrap -configPath=/var/vcap/jobs/bootstrap/config/config.yml + echo 'Bootstrap errand completed' + exit 0 Errand `bootstrap' completed successfully (exit code 0)
If the bootstrap errand does not work immediately, wait and try it again a few minutes later.
Scenario 2: Virtual Machines Terminated or Lost
In severe circumstances, such as a power failure, it is possible to lose all your VMs. You must recreate them before you can begin recovering the cluster.
To bootstrap in this scenario, follow the steps in the sections below.
Determine State of VMs
To determine the state of your VMs, run one of the following commands depending on your Ops Manager version:
Ops Manager v1.11 or later:
bosh2 -e YOUR-ENV instances
Ops Manager v1.10:
bosh vms
The output resembles the output in the previous section.
If the VM is terminated or lost, the process state for the mysql
jobs is shown as -
.
Recover Terminated or Lost VMs
To recover terminated or lost VMs, do the procedures in the following sections:
Recreate the Missing VMs
The procedure in this section uses BOSH to recreate the VMs, install software on them, and try to start the jobs.
The jobs will fail because the MySQL VMs fail when started if there is no active cluster for them to join. Therefore you must instruct BOSH to ignore the failing state of each VM to allow the software to be deployed to all VMs.
Choose one of the following procedures depending on your Ops Manager version:
Ops Manager v1.11 or Later
- Log in to the BOSH Director.
If BOSH resurrection is enabled, disable it by running the following command:
bosh2 -e MY-ENV update-resurrection off
Download the current manifest by running the following command:
bosh2 -e MY-ENV -d MY-DEP manifest > /tmp/manifest.yml
Redeploy by running the following command:
bosh2 -e MY-ENV -d MY-DEP deploy /tmp/manifest.yml
The deploy fails to start the first MySQL VM.
Instruct BOSH to ignore each MySQL VM, providing its
INSTANCE-GUID
. Run the following command:bosh2 -e MY-ENV -d MY-DEP ignore mysql/INSTANCE_GUID
Repeat steps 4 and 5 until all instances have attempted to start.
If you disabled BOSH resurrection in step 2, re-enable it by running the following command:
bosh2 -e MY-ENV update-resurrection on
Ops Manager v1.10
- Log in to the BOSH Director.
If BOSH resurrection is enabled, disable it by running the following command:
bosh vm resurrection off
Target the correct deployment by running the following command:
bosh deployment PATH-TO-DEPLOYMENT-MANIFEST
Redeploy so that BOSH attempts to start one instance. Run the following command:
bosh deploy
The deploy fails to start the first MySQL VM.
Instruct BOSH to ignore each MySQL VM, providing its
INSTANCE-GUID
. Run the following command:bosh ignore instance mysql/INSTANCE_GUID
Repeat steps 4 and 5 until all instances have attempted to start.
If you disabled BOSH resurrection in step 2, re-enable it by running the following command:
bosh vm resurrection on
Run the Bootstrap Errand
All instances now have a failing
process state, but also have the MySQL code installed on them. In this state, the bootstrap process recovers the cluster.
Run one of the following commands depending on your Ops Manager version:
- Ops Manager v1.11 or later:
bosh2 -e MY-ENV -d MY-DEP run-errand bootstrap
- Ops Manager v1.10:
bosh run errand bootstrap
Validate that the errand completes successfully. Even if some instances still appear as
failing
, proceed to the next step.
Restore the BOSH Configuration
WARNING: You must run all of the steps. If you do not unignore all ignored instances, they are not updated in future deploys.
To restore your BOSH configuration to its previous state, this procedure unignores each instance that was previously ignored:
For each ignored instance, run one of the following commands depending on your Ops Manager version:
- Ops Manager v1.11 or later:
bosh2 -e MY-ENV -d MY-DEP unignore mysql/INSTANCE_GUID
- Ops Manager v1.10:
bosh unignore instance mysql/INSTANCE_GUID
Redeploy:
- Ops Manager v1.11 or later:
bosh2 -e MY-ENV -d MY-DEP deploy
- Ops Manager v1.10:
bosh deploy
Validate that all
mysql
instances are in arunning
state.
Bootstrap Manually
If the bootstrap errand is not able to automatically recover the cluster, you might need to do the steps manually.
WARNING: The following procedures are prone to user-error and can result in lost data if followed incorrectly. Follow the procedure in Bootstrap with the BOSH Errand above first, and only resort to the manual process if the errand fails to repair the cluster.
Do the procedures in the sections below to manually bootstrap your cluster.
Shut Down MariaDB
Do the following for each node in the cluster:
SSH into the node:
- Ops Manager v1.11 or later: See the BOSH CLI v2 instructions for SSHing into BOSH-deployed VMs.
- Ops Manager v1.10: See the BOSH CLI v1 instructions for SSHing into BOSH-deployed VMs.
Shut down the
mariadb
process on the node. Run the following command:monit stop mariadb_ctrl
Re-bootstrapping the cluster is not successful unless you shut down the mariadb
process on all nodes in the cluster.
Choose Node to Bootstrap
To choose the node to bootstrap, you must find the node with the highest transaction sequence number
.
Do the following for each node in the cluster:
To SSH into the node, run one of the following commands:
- Ops Manager v1.11 or later: See the BOSH CLI v2 instructions for SSHing into BOSH-deployed VMs.
- Ops Manager v1.10: See the BOSH CLI v1 instructions for SSHing into BOSH-deployed VMs.
To view the sequence number for a node, run the following command:
/var/vcap/jobs/mysql/bin/get-sequence-number
When prompted confirm that you want to stop MySQL.
For example:
$ /var/vcap/jobs/mysql/bin/get-sequence-number This script stops mysql. Are you sure? (y/n): y {"sequence_number":421,"instance_id":"012abcde-f34g-567h-ijk8-9123l4567891"}
Record the value of
sequence_number
.
After determining the sequence_number
for all nodes in your cluster, identify the node with the highest sequence_number
.
If all nodes have the same sequence_number
, you can choose any node as the new bootstrap node.
Bootstrap the First Node
After determining the node with the highest sequence_number
, do the following to bootstrap the node:
Note: Only run these bootstrap commands on the node with the highest sequence_number
.
Otherwise the node with the highest sequence_number
is unable to join the new cluster unless its data is abandoned.
Its mariadb
process exits with an error.
For more information about intentionally abandoning data, see Architecture.
On the new bootstrap node, update the state file and restart the
mariadb
process. Run the following commands:echo -n "NEEDS_BOOTSTRAP" > /var/vcap/store/mysql/state.txt monit start mariadb_ctrl
It can take up to ten minutes for
monit
to start themariadb
process. To check if themariadb
process has started successfully, run the following command:watch monit summary
Restart Remaining Nodes
After the bootstrapped node is running, start the
mariadb
process on the remaining nodes withmonit
. From the bootstrap node, run the following command:monit start mariadb_ctrl
If the node is prevented from starting by the Interruptor, do the manual procedure to force the node to rejoin the cluster, documented in Pivotal Knowledge Base.
WARNING: Forcing a node to rejoin the cluster is a destructive procedure. Only do the procedure with the assistance of Pivotal Support.
If the
monit start
command fails, it might be because the node with the highestsequence_number
ismysql/0
. In this case, do the following:From the Ops Manager VM, use the BOSH CLI to make BOSH ignore updating
mysql/0
:Ops Manager v1.11 or later:
bosh2 -e MY-ENV -d MY-DEP ignore mysql/0
Ops Manager v1.10:
bosh ignore mysql/0
Navigate to Ops Manager in a browser, log in, and click Apply Changes.
When the deploy finishes, run the following command from the Ops Manager VM:
Ops Manager v1.11 or later:
bosh2 -e MY-ENV -d MY-DEP unignore mysql/0
Ops Manager v1.10:
bosh unignore mysql/0
Verify that the new nodes have successfully joined the cluster. SSH into the bootstrap node and run the following command to output the total number of nodes in the cluster:
mysql> SHOW STATUS LIKE 'wsrep_cluster_size';