LATEST VERSION: 2.6 - RELEASE NOTES

Bootstrapping

This topic describes how to bootstrap your MySQL cluster in the event of a cluster failure.

Note: The examples in these instructions reflect a three-node MySQL for Pivotal Cloud Foundry (PCF) deployment. The process to bootstrap a two-node plus an arbitrator is identical, but the output will not match the examples.

When to Bootstrap

To determine whether you need to bootstrap your cluster, you must check whether the cluster has lost quorum. Bootstrapping is only required when the cluster has lost quorum. See Check Cluster State for more information about checking the state of your cluster.

Quorum is lost when less than half of the nodes can communicate with each other for longer than the configured grace period. In Galera terminology, if a node can communicate with the rest of the cluster, its database is in a good state, and it reports itself as synced.

If quorum has not been lost, individual unhealthy nodes automatically rejoin the cluster once repaired, which means the error is resolved, the node is restarted, or connectivity is restored.

To check whether your cluster has lost quorum, look for the following symptoms:

  • All nodes appear “Unhealthy” on the proxy dashboard, as in the following screenshot: 3 out of 3 nodes are unhealthy.
  • All responsive nodes report the value of wsrep_cluster_status as non-Primary in the MySQL client.

    mysql> SHOW STATUS LIKE 'wsrep_cluster_status';
    +----------------------+-------------+
    | Variable_name        | Value       |
    +----------------------+-------------+
    | wsrep_cluster_status | non-Primary |
    +----------------------+-------------+
    
  • All unresponsive nodes respond with ERROR 1047 when using most statement types in the MySQL client:

    mysql> select * from mysql.user;
    ERROR 1047 (08S01) at line 1: WSREP has not yet prepared node for application use
    

Prerequisites for Bootstrapping

Before running the bootstrapping procedures below, you must SSH into the Ops Manager VM and log in to the BOSH Director. For more information, see Prepare to Use the BOSH CLI.

Note: The version of the BOSH CLI you use depends on your version of Ops Manager. For Ops Manager v1.11 or later, use the BOSH CLI v2. For Ops Manager v1.10, use the BOSH CLI v1.

Bootstrap with the BOSH Errand

MySQL for PCF includes a BOSH errand to automate the process of bootstrapping. You must still manually initiate the bootstrap process, but using the errand reduces the number of manual steps necessary to complete the process.

The errand automates the manual bootstrapping procedure in the Bootstrap Manually section below. It finds the node with the highest transaction sequence number, and asks it to start up by itself (i.e. in bootstrap mode), and then asks the remaining nodes to join the cluster.

To bootstrap with the errand, follow the steps for one of the following scenarios:

Scenario 1: Virtual Machines Running, Cluster Disrupted

In this scenario, the nodes are up and running, but the cluster has been disrupted.

To bootstrap in this scenario, follow these steps:

  1. To determine whether the cluster has been disrupted, use the BOSH CLI to list the jobs and see if they are failing:

    • Ops Manager v1.11 or later:
    bosh2 -e YOUR-ENV instances
    
    • Ops Manager v1.10:
    bosh vms
    

    The output resembles the following:

    Instance                                                             Process State  AZ             IPs
    backup-prepare/c635410e-917d-46aa-b054-86d222b6d1c0                  running        us-central1-b  10.0.4.47
    bootstrap/a31af4ff-e1df-4ff1-a781-abc3c6320ed4                       -              us-central1-b  -
    broker-registrar/1a93e53d-af7c-4308-85d4-3b2b80d504e4                -              us-central1-b  10.0.4.58
    cf-mysql-broker/137d52b8-a1b0-41f3-847f-c44f51f87728                 running        us-central1-c  10.0.4.57
    cf-mysql-broker/28b463b1-cc12-42bf-b34b-82ca7c417c41                 running        us-central1-b  10.0.4.56
    deregister-and-purge-instances/4cb93432-4d90-4f1d-8152-d0c238fa5aab  -              us-central1-b  -
    monitoring/f7117dcb-1c22-495e-a99e-cf2add90dea9                      running        us-central1-b  10.0.4.48
    mysql/220fe72a-9026-4e2e-9fe3-1f5c0b6bf09b                           failing        us-central1-b  10.0.4.44
    mysql/28a210ac-cb98-4ab4-9672-9f4c661c57b8                           failing        us-central1-f  10.0.4.46
    mysql/c1639373-26a2-44ce-85db-c9fe5a42964b                           failing        us-central1-c  10.0.4.45
    proxy/87c5683d-12f5-426c-b925-62521529f64a                           running        us-central1-b  10.0.4.60
    proxy/b0115ccd-7973-42d3-b6de-edb5ae53c63e                           running        us-central1-c  10.0.4.61
    rejoin-unsafe/8ce9370a-e86b-4638-bf76-e103f858413f                   -              us-central1-b  -
    smoke-tests/e026aaef-efd9-4644-8d14-0811cb1ba733                     -              us-central1-b  10.0.4.59
    

  2. If the jobs are failing, do the following:

    1. If you are using Ops Manager v1.0, run the following command to select the correct deployment:

      bosh deployment PATH-TO-DEPLOYMENT-MANIFEST
      
    2. Run the boostrap errand:

      • Ops Manager v1.11 or later:
      bosh2 -e YOUR-ENV -d YOUR-DEP run-errand bootstrap
      
      • Ops Manager v1.10:
      bosh run errand bootstrap
      

      The command returns many lines of output, eventually followed by:

      Bootstrap errand completed
      [stderr]
      + echo 'Started bootstrap errand ...'
      + JOB_DIR=/var/vcap/jobs/bootstrap
      + CONFIG_PATH=/var/vcap/jobs/bootstrap/config/config.yml
      + /var/vcap/packages/bootstrap/bin/cf-mysql-bootstrap -configPath=/var/vcap/jobs/bootstrap/config/config.yml
      + echo 'Bootstrap errand completed'
      + exit 0
      Errand `bootstrap' completed successfully (exit code 0)
      

      If the bootstrap errand does not work immediately, wait and try it again a few minutes later.

Scenario 2: Virtual Machines Terminated or Lost

In severe circumstances, such as a power failure, it is possible to lose all your VMs. You must recreate them before you can begin recovering the cluster.

To bootstrap in this scenario, follow the steps in the sections below.

Determine State of VMs

To determine the state of your VMs, run one of the following commands depending on your Ops Manager version:

  • Ops Manager v1.11 or later:

    bosh2 -e YOUR-ENV instances
    
  • Ops Manager v1.10:

    bosh vms
    

The output resembles the output in the previous section. If the VM is terminated or lost, the process state for the mysql jobs is shown as -.

Recover Terminated or Lost VMs

To recover terminated or lost VMs, do the procedures in the following sections:

  1. Recreate the Missing VMs
  2. Run the Bootstrap Errand
  3. Restore the BOSH Configuration

Recreate the Missing VMs

The procedure in this section uses BOSH to recreate the VMs, install software on them, and try to start the jobs.

The jobs will fail because the MySQL VMs fail when started if there is no active cluster for them to join. Therefore you must instruct BOSH to ignore the failing state of each VM to allow the software to be deployed to all VMs.

Choose one of the following procedures depending on your Ops Manager version:

Ops Manager v1.11 or Later
  1. Log in to the BOSH Director.
  2. If BOSH resurrection is enabled, disable it by running the following command:

    bosh2 -e MY-ENV update-resurrection off
    
  3. Download the current manifest by running the following command:

    bosh2 -e MY-ENV -d MY-DEP manifest > /tmp/manifest.yml
    
  4. Redeploy by running the following command:

    bosh2 -e MY-ENV -d MY-DEP deploy /tmp/manifest.yml
    

    The deploy fails to start the first MySQL VM.

  5. Instruct BOSH to ignore each MySQL VM, providing its INSTANCE-GUID. Run the following command:

    bosh2 -e MY-ENV -d MY-DEP ignore mysql/INSTANCE_GUID
    
  6. Repeat steps 4 and 5 until all instances have attempted to start.

  7. If you disabled BOSH resurrection in step 2, re-enable it by running the following command:

    bosh2 -e MY-ENV update-resurrection on
    
Ops Manager v1.10
  1. Log in to the BOSH Director.
  2. If BOSH resurrection is enabled, disable it by running the following command:

    bosh vm resurrection off
    
  3. Target the correct deployment by running the following command:

    bosh deployment PATH-TO-DEPLOYMENT-MANIFEST
    
  4. Redeploy so that BOSH attempts to start one instance. Run the following command:

    bosh deploy
    

    The deploy fails to start the first MySQL VM.

  5. Instruct BOSH to ignore each MySQL VM, providing its INSTANCE-GUID. Run the following command:

    bosh ignore instance mysql/INSTANCE_GUID
    
  6. Repeat steps 4 and 5 until all instances have attempted to start.

  7. If you disabled BOSH resurrection in step 2, re-enable it by running the following command:

    bosh vm resurrection on
    

Run the Bootstrap Errand

All instances now have a failing process state, but also have the MySQL code installed on them. In this state, the bootstrap process recovers the cluster.

  1. Run one of the following commands depending on your Ops Manager version:

    • Ops Manager v1.11 or later:
    bosh2 -e MY-ENV -d MY-DEP run-errand bootstrap
    
    • Ops Manager v1.10:
    bosh run errand bootstrap
    
  2. Validate that the errand completes successfully. Even if some instances still appear as failing, proceed to the next step.

Restore the BOSH Configuration

WARNING: You must run all of the steps. If you do not unignore all ignored instances, they are not updated in future deploys.

To restore your BOSH configuration to its previous state, this procedure unignores each instance that was previously ignored:

  1. For each ignored instance, run one of the following commands depending on your Ops Manager version:

    • Ops Manager v1.11 or later:
    bosh2 -e MY-ENV -d MY-DEP unignore mysql/INSTANCE_GUID
    
    • Ops Manager v1.10:
    bosh unignore instance mysql/INSTANCE_GUID
    
  2. Redeploy:

    • Ops Manager v1.11 or later:
    bosh2 -e MY-ENV -d MY-DEP deploy
    
    • Ops Manager v1.10:
    bosh deploy
    
  3. Validate that all mysql instances are in a running state.

Bootstrap Manually

If the bootstrap errand is not able to automatically recover the cluster, you might need to do the steps manually.

WARNING: The following procedures are prone to user-error and can result in lost data if followed incorrectly. Follow the procedure in Bootstrap with the BOSH Errand above first, and only resort to the manual process if the errand fails to repair the cluster.

Do the procedures in the sections below to manually bootstrap your cluster.

Shut Down MariaDB

Do the following for each node in the cluster:

  1. SSH into the node:

  2. Shut down the mariadb process on the node. Run the following command:

    monit stop mariadb_ctrl
    

Re-bootstrapping the cluster is not successful unless you shut down the mariadb process on all nodes in the cluster.

Choose Node to Bootstrap

To choose the node to bootstrap, you must find the node with the highest transaction sequence number.

Do the following for each node in the cluster:

  1. To SSH into the node, run one of the following commands:

  2. To view the sequence number for a node, run the following command:

    /var/vcap/jobs/mysql/bin/get-sequence-number
    

    When prompted confirm that you want to stop MySQL.

    For example:

     
      $ /var/vcap/jobs/mysql/bin/get-sequence-number
      This script stops mysql. Are you sure? (y/n): y
    
      {"sequence_number":421,"instance_id":"012abcde-f34g-567h-ijk8-9123l4567891"}
    
  3. Record the value of sequence_number.

After determining the sequence_number for all nodes in your cluster, identify the node with the highest sequence_number. If all nodes have the same sequence_number, you can choose any node as the new bootstrap node.

Bootstrap the First Node

After determining the node with the highest sequence_number, do the following to bootstrap the node:

Note: Only run these bootstrap commands on the node with the highest sequence_number. Otherwise the node with the highest sequence_number is unable to join the new cluster unless its data is abandoned. Its mariadb process exits with an error. For more information about intentionally abandoning data, see Architecture.

  1. On the new bootstrap node, update the state file and restart the mariadb process. Run the following commands:

    echo -n "NEEDS_BOOTSTRAP" > /var/vcap/store/mysql/state.txt
    monit start mariadb_ctrl
  2. It can take up to ten minutes for monit to start the mariadb process. To check if the mariadb process has started successfully, run the following command:

    watch monit summary

Restart Remaining Nodes

  1. After the bootstrapped node is running, start the mariadb process on the remaining nodes with monit. From the bootstrap node, run the following command:

    monit start mariadb_ctrl

    If the node is prevented from starting by the Interruptor, do the manual procedure to force the node to rejoin the cluster, documented in Pivotal Knowledge Base.

    WARNING: Forcing a node to rejoin the cluster is a destructive procedure. Only do the procedure with the assistance of Pivotal Support.

  2. If the monit start command fails, it might be because the node with the highest sequence_number is mysql/0. In this case, do the following:

    1. From the Ops Manager VM, use the BOSH CLI to make BOSH ignore updating mysql/0:

      • Ops Manager v1.11 or later:

        bosh2 -e MY-ENV -d MY-DEP ignore mysql/0
        
      • Ops Manager v1.10:

        bosh ignore mysql/0
        
    2. Navigate to Ops Manager in a browser, log in, and click Apply Changes.

    3. When the deploy finishes, run the following command from the Ops Manager VM:

      • Ops Manager v1.11 or later:

        bosh2 -e MY-ENV -d MY-DEP unignore mysql/0
        
      • Ops Manager v1.10:

        bosh unignore mysql/0
        
  3. Verify that the new nodes have successfully joined the cluster. SSH into the bootstrap node and run the following command to output the total number of nodes in the cluster:

    mysql> SHOW STATUS LIKE 'wsrep_cluster_size';
Create a pull request or raise an issue on the source for this page in GitHub