Triggering a Leader-Follower Failover

Note: In v2.9 and later, MySQL for VMware Tanzu is named VMware Tanzu SQL with MySQL for VMs.

Page last updated:

This topic describes how to trigger a failover of apps from the leader to the follower.

Overview

You might want to trigger a failover in the following scenarios:

  • You want to take the leader VM down to do planned maintenance.
  • The performance of the leader VM degrades.
  • The leader VM fails unexpectedly.
  • The AZ where the leader VM is located goes offline unexpectedly.

You can also use the following metrics to determine if you need to trigger a failover:

  • /p.mysql/available: This metric monitors whether the MySQL server is currently available. For more information, see Server Availability.

  • /p.mysql/follower/seconds_behind_master: This metric monitors how far behind the follower is in applying writes from the leader. For more information, see Leader-Follower Metrics.

  • /p.mysql/follower/seconds_since_leader_heartbeat: This metric monitors the number of seconds that elapse between the leader heartbeat and the replication of the heartbeat in the follower. For more information, see Leader-Follower Metrics.

For information about errands used to trigger failover, see configure-leader-follower, make-leader, and make-read-only.

To trigger a failover:

  1. Retrieve Information
  2. Promote the Follower
  3. Clean Up Former Leader VM (Optional)
  4. Configure the New Follower
  5. Unbind and Rebind the App

Retrieve Information

To retrieve the information necessary for stopping the leader and promoting the follower:

  1. Log in to your deployment by running:

    cf login API-URL
    

    When prompted, enter your credentials.

  2. Target the org and space where the leader-follower service instance is located by running:

    cf target -o DESTINATION-ORG -s DESTINATION-SPACE
    
  3. Record the GUID of the service instance by running:

    cf service SERVICE-INSTANCE-NAME --guid
    

    Where SERVICE-INSTANCE-NAME is the name of the leader-follower service instance.

    For example:

    $ cf service my-lf-instance --guid
    82ddc607-710a-404e-b1b8-a7e3ea7ec063
    
    If you do not know the name of the service instance, you can list service instances in the space with cf services.

  4. Follow the procedure at Gather Credential and IP Address Information and SSH into Ops Manager to SSH into the Ops Manager VM.

  5. From the Ops Manager VM, log in to your BOSH Director with the BOSH CLI. See Log in to the BOSH Director.

  6. Use the BOSH CLI to run the inspect errand by running:

    bosh -d service-instance_GUID run-errand inspect
    

    Where GUID is the GUID of the leader-follower service instance you recorded.

    For example:

    $ bosh -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
        run-errand inspect

  7. See the output about the leader-follower MySQL VMs and identify the instance marked Role: leader.

    For example output:

    Instance   mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0
    Exit Code  0
    Stdout     2018/04/03 18:08:46 Started executing command: inspect
            2018/04/03 18:08:46
            IP Address: 10.0.8.11
            Role: leader
            Read Only: false
            Replication Configured: false
            Replication Mode: async
            Has Data: true
            GTID Executed: 82ddc607-710a-404e-b1b8-a7e3ea7ec063:1-18
            2018/04/03 18:08:46 Successfully executed command: inspect
    Stderr     -
    
    Instance mysql/37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8 Exit Code 0 Stdout 2018/04/03 18:08:46 Started executing command: inspect 2018/04/03 18:08:46 IP Address: 10.0.8.10 Role: follower Read Only: true Replication Configured: true Replication Mode: async Has Data: true GTID Executed: 82ddc607-710a-404e-b1b8-a7e3ea7ec063:1-18 2018/04/03 18:08:46 Successfully executed command: inspect

  8. Record the index of the instance marked Role: leader. In the above example output, the index of the leader VM is ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0.

  9. Record the index of the other instance, which is the follower VM. In the above example output, the index of the follower VM is 37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8.

  10. If you still have access to the AZ where the leader VM is located, determine if the leader VM is in the AZ you want to take offline by running:

    bosh -d service-instance_GUID run-errand instances
    

    For example:

    $ bosh -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
    instances
    Deployment 'service-instance_f378ec82-61a4-4e66-8ed9-889c7cf5342f'
    
    Instance Process State AZ IPs mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0 failing us-central1-f 10.0.8.11 mysql/37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8 running us-central1-a 10.0.8.10 2 instances

    Note: The leader VM might not display its status as failing if you are performing planned maintenance.

    Examine the output to determine if the leader VM is in the AZ you want to take offline.

Promote the Follower

To stop the leader VM and promote the follower VM to the new leader:

  1. Stop any data from being written to the leader VM by setting it to read-only by running:

     bosh -d service-instance_GUID \
     run-errand make-read-only \
      --instance=mysql/INDEX
    

    Where:

    • GUID: This is the GUID of the leader-follower service instance retrieved above.
    • INDEX: This is the index of the leader VM retrieved above.

    For example:

    $ bosh -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
        run-errand make-read-only \
        --instance=mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0

  2. If you still have access to the AZ where the leader VM is located, stop the leader VM by running:

      bosh -d service-instance_GUID stop mysql/INDEX
    

    Use the index of the leader VM retrieved above.

    For example:

    $ bosh -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
        stop mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0

  3. Set the follower VM as writable by running:

    bosh -d service-instance_GUID run-errand make-leader --instance=mysql/INDEX
    

    Use the index of the follower VM retrieved above.

    For example:

    $ bosh -d service-instance_82dc607-710a-404e-b1b8-a7e3ea7ec063 \
      run-errand make-leader \
      --instance=mysql/37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8

    If this command returns an error, re-run it until the follower VM has completed applying the transactions.

At this point, a single instance is working but leader-follower replication has not yet been restored. To fail your app over to a single instance instead of restoring leader-follower, skip to Unbind and Rebind the App below.

If you are triggering a failover in response to the AZ of the leader VM going offline, you can fail your app over to a single instance by following the procedure in Unbind and Rebind the App below. However, to restore leader-follower, you must regain access to the AZ where your leader VM is located before following the procedure in Clean Up Former Leader VM (Optional) and Configure the New Follower below.

Clean Up Former Leader VM (Optional)

If you are triggering a failover in response to a failing leader VM, to clean up the former leader VM:

  1. Disable resurrection, specifying the same deployment as above, by running:

      bosh update-resurrection off
    
  2. Retrieve the CID of the failing former leader VM by running:

    bosh -d service-instance_GUID instances \
      --details \
      --failing \
      --column=”VM CID” \
      --json
    

    For example:

    $ bosh -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 instances \
        --details \
        --failing \
        --column=”VM CID” \
        --json

  3. Retrieve the disk CID of the failing former leader VM by running:

    bosh -d service-instance_GUID instances \
      --details \
      --failing \
      --column=”Disk CIDs” \
      --json
    

    For example:

    $ bosh -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 instances \
      --details \
      --failing \
      --column=”Disk CIDs” \
      --json

  4. Delete the failing former leader VM by running:

    bosh -d service-instance_GUID delete-vm vm-CID
    

    Where:

    • GUID: This is the GUID of the leader-follower service instance retrieved above.
    • CID: This is the CID of the failing former leader VM retrieved above.

    For example:

    $ bosh -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
        delete-vm i-1db9ede6

  5. Orphan the disk of the failing former leader VM by running:

    bosh -d service-instance_GUID orphan-disk DISK-CID
    

    Where:

    • GUID: This is the GUID of the leader-follower service instance retrieved above.
    • DISK-CID: This is the disk CID of the failing former leader VM retrieved above.

    For example:

     $ bosh -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
        orphan-disk b-1db9ede6
    Orphaning a disk rather than deleting it preserves the disk for possible recovery. After performing recovery operations, you can reattach the disk to a VM. BOSH deletes orphaned disks after five days by default.

Configure the New Follower

To restart the former leader VM and configure it as the new follower:

  1. Re-create the former leader VM by running:

    bosh -d service-instance_GUID \
      recreate \
      mysql/INDEX
    

    Where:

    • GUID: This is the GUID of the leader-follower service instance retrieved above.
    • INDEX: This is the index of the former leader VM that you are re-creating
    For example:
    $ bosh -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
    recreate \
    mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd01.
  2. Set the former leader VM as a follower, using the same values as above, by running:

    bosh -d service-instance_GUID \
      run-errand configure-leader-follower \
      --instance=mysql/INDEX
    

    For example:

    $ bosh -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
      run-errand configure-leader-follower \
      --instance=mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0

  3. Use the BOSH CLI to run the inspect errand, using the same value as above, by running:

      bosh -d service-instance_GUID \
          run-errand inspect
    

    For example:

    $ bosh -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
       run-errand inspect
    If the output displays one instance marked Role: leader and another instance marked Role: follower, then leader-follower replication and high availability are resumed. The deployment should be in its original, working state. You can turn resurrection back on if desired.

Unbind and Rebind the App

To fail their apps over to the new leader VM, your developers must bind and rebind their apps to the leader-follower service instance:

Note: If you have BOSH DNS enabled in Ops Manager, you do not need to unbind and re-bind your app to a leader-follower service instance to failover the app. The operator enables BOSH DNS in BOSH Director > BOSH DNS Config.

Warning: If a developer rebinds an app to the Tanzu SQL for VMs service after unbinding, they must also rebind any existing custom schemas to the app. When you rebind an app, stored code, programs, and triggers break. For more information about binding custom schemas, see Use Custom Schemas.

To unbind and rebind your app:

  1. Unbind the app from the leader-follower service instance by running:

    cf unbind-service APP-NAME  SERVICE-INSTANCE-NAME
    

    Where:

    • APP-NAME: This is the name of the app bound to the leader-follower service instance.
    • SERVICE-INSTANCE-NAME: This is the name of the leader-follower service instance.

  2. Rebind the app to the leader-follower service instance by running:

    cf bind-service APP-NAME  SERVICE-INSTANCE-NAME
    
  3. Restage the app by running:

    cf restage APP-NAME