Triggering a Leader-Follower Failover
Warning: MySQL for VMware Tanzu v2.8 is no longer supported because it has reached the End of General Support (EOGS) phase as defined by the Support Lifecycle Policy. To stay up to date with the latest software and security updates, upgrade to a supported version.
Page last updated:
This topic describes how to trigger a failover of apps from the leader to the follower.
Overview
You might want to trigger a failover in the following scenarios:
- You want to take the leader VM down to do planned maintenance.
- The performance of the leader VM degrades.
- The leader VM fails unexpectedly.
- The AZ where the leader VM is located goes offline unexpectedly.
You can also use the following metrics to determine if you need to trigger a failover:
/p.mysql/available
: This metric monitors whether the MySQL server is currently available. For more information, see Server Availability./p.mysql/follower/seconds_behind_master
: This metric monitors how far behind the follower is in applying writes from the leader. For more information, see Leader-Follower Metrics./p.mysql/follower/seconds_since_leader_heartbeat
: This metric monitors the number of seconds that elapse between the leader heartbeat and the replication of the heartbeat in the follower. For more information, see Leader-Follower Metrics.
For information about errands used to trigger failover, see configure-leader-follower, make-leader, and make-read-only.
To trigger a failover:
- Retrieve Information
- Promote the Follower
- Clean Up Former Leader VM (Optional)
- Configure the New Follower
- Unbind and Rebind the App
Retrieve Information
To retrieve the information necessary for stopping the leader and promoting the follower:
Log in to your deployment by running:
cf login API-URL
When prompted, enter your credentials.
Target the org and space where the leader-follower service instance is located by running:
cf target -o DESTINATION-ORG -s DESTINATION-SPACE
Record the GUID of the service instance by running:
cf service SERVICE-INSTANCE-NAME --guid
Where
SERVICE-INSTANCE-NAME
is the name of the leader-follower service instance.
For example:$ cf service my-lf-instance --guid 82ddc607-710a-404e-b1b8-a7e3ea7ec063
If you do not know the name of the service instance, you can list service instances in the space withcf services
.Follow the procedure at Gather Credential and IP Address Information and SSH into Ops Manager to SSH into the Ops Manager VM.
From the Ops Manager VM, log in to your BOSH Director with the BOSH CLI. See Log in to the BOSH Director.
Use the BOSH CLI to run the
inspect
errand by running:bosh -d service-instance_GUID run-errand inspect
Where
GUID
is the GUID of the leader-follower service instance you recorded.
For example:$ bosh -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \ run-errand inspect
See the output about the leader-follower MySQL VMs and identify the instance marked
Role: leader
.
For example output:Instance mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0 Exit Code 0 Stdout 2018/04/03 18:08:46 Started executing command: inspect 2018/04/03 18:08:46 IP Address: 10.0.8.11 Role: leader Read Only: false Replication Configured: false Replication Mode: async Has Data: true GTID Executed: 82ddc607-710a-404e-b1b8-a7e3ea7ec063:1-18 2018/04/03 18:08:46 Successfully executed command: inspect Stderr -
Instance mysql/37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8 Exit Code 0 Stdout 2018/04/03 18:08:46 Started executing command: inspect 2018/04/03 18:08:46 IP Address: 10.0.8.10 Role: follower Read Only: true Replication Configured: true Replication Mode: async Has Data: true GTID Executed: 82ddc607-710a-404e-b1b8-a7e3ea7ec063:1-18 2018/04/03 18:08:46 Successfully executed command: inspectRecord the index of the instance marked
Role: leader
. In the above example output, the index of the leader VM isca0ed8b5-7590-4cde-bba8-7ca2935f2bd0
.Record the index of the other instance, which is the follower VM. In the above example output, the index of the follower VM is
37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8
.If you still have access to the AZ where the leader VM is located, determine if the leader VM is in the AZ you want to take offline by running:
bosh -d service-instance_GUID run-errand instances
For example:
$ bosh -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \ instances Deployment 'service-instance_f378ec82-61a4-4e66-8ed9-889c7cf5342f'
Instance Process State AZ IPs mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0 failing us-central1-f 10.0.8.11 mysql/37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8 running us-central1-a 10.0.8.10 2 instancesNote: The leader VM might not display its status as
Examine the output to determine if the leader VM is in the AZ you want to take offline.failing
if you are performing planned maintenance.
Promote the Follower
To stop the leader VM and promote the follower VM to the new leader:
Stop any data from being written to the leader VM by setting it to read-only by running:
bosh -d service-instance_GUID \ run-errand make-read-only \ --instance=mysql/INDEX
Where:
GUID
: This is the GUID of the leader-follower service instance retrieved above.INDEX
: This is the index of the leader VM retrieved above.
For example:
$ bosh -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \ run-errand make-read-only \ --instance=mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0
If you still have access to the AZ where the leader VM is located, stop the leader VM by running:
bosh -d service-instance_GUID stop mysql/INDEX
Use the index of the leader VM retrieved above.
For example:$ bosh -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \ stop mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0
Set the follower VM as writable by running:
bosh -d service-instance_GUID run-errand make-leader --instance=mysql/INDEX
Use the index of the follower VM retrieved above.
For example:
$ bosh -d service-instance_82dc607-710a-404e-b1b8-a7e3ea7ec063 \ run-errand make-leader \ --instance=mysql/37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8
If this command returns an error, re-run it until the follower VM has completed applying the transactions.
At this point, a single instance is working but leader-follower replication has not yet been restored. To fail your app over to a single instance instead of restoring leader-follower, skip to Unbind and Rebind the App below.
If you are triggering a failover in response to the AZ of the leader VM going offline, you can fail your app over to a single instance by following the procedure in Unbind and Rebind the App below. However, to restore leader-follower, you must regain access to the AZ where your leader VM is located before following the procedure in Clean Up Former Leader VM (Optional) and Configure the New Follower below.
Clean Up Former Leader VM (Optional)
If you are triggering a failover in response to a failing leader VM, to clean up the former leader VM:
Disable resurrection, specifying the same deployment as above, by running:
bosh update-resurrection off
Retrieve the CID of the failing former leader VM by running:
bosh -d service-instance_GUID instances \ --details \ --failing \ --column=”VM CID” \ --json
For example:
$ bosh -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 instances \ --details \ --failing \ --column=”VM CID” \ --json
Retrieve the disk CID of the failing former leader VM by running:
bosh -d service-instance_GUID instances \ --details \ --failing \ --column=”Disk CIDs” \ --json
For example:
$ bosh -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 instances \ --details \ --failing \ --column=”Disk CIDs” \ --json
Delete the failing former leader VM by running:
bosh -d service-instance_GUID delete-vm vm-CID
Where:
GUID
: This is the GUID of the leader-follower service instance retrieved above.CID
: This is the CID of the failing former leader VM retrieved above.
For example:
$ bosh -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \ delete-vm i-1db9ede6
Orphan the disk of the failing former leader VM by running:
bosh -d service-instance_GUID orphan-disk DISK-CID
Where:
GUID
: This is the GUID of the leader-follower service instance retrieved above.DISK-CID
: This is the disk CID of the failing former leader VM retrieved above.
For example:
$ bosh -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \ orphan-disk b-1db9ede6
Orphaning a disk rather than deleting it preserves the disk for possible recovery. After performing recovery operations, you can reattach the disk to a VM. BOSH deletes orphaned disks after five days by default.
Configure the New Follower
To restart the former leader VM and configure it as the new follower:
Re-create the former leader VM by running:
bosh -d service-instance_GUID \ recreate \ mysql/INDEX
Where:
GUID
: This is the GUID of the leader-follower service instance retrieved above.INDEX
: This is the index of the former leader VM that you are re-creating
$ bosh -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \ recreate \ mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd01.
Set the former leader VM as a follower, using the same values as above, by running:
bosh -d service-instance_GUID \ run-errand configure-leader-follower \ --instance=mysql/INDEX
For example:
$ bosh -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \ run-errand configure-leader-follower \ --instance=mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0
Use the BOSH CLI to run the
inspect
errand, using the same value as above, by running:bosh -d service-instance_GUID \ run-errand inspect
For example:
$ bosh -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \ run-errand inspect
If the output displays one instance markedRole: leader
and another instance markedRole: follower
, then leader-follower replication and high availability are resumed. The deployment should be in its original, working state. You can turn resurrection back on if desired.
Unbind and Rebind the App
To fail their apps over to the new leader VM, your developers must bind and rebind their apps to the leader-follower service instance:
Note: If you have BOSH DNS enabled in Ops Manager, you do not need to unbind and re-bind your app to a leader-follower service instance to failover the app. The operator enables BOSH DNS in BOSH Director > BOSH DNS Config.
Warning: If a developer rebinds an app to the MySQL for VMware Tanzu service after unbinding, they must also rebind any existing custom schemas to the app. When you rebind an app, stored code, programs, and triggers break. For more information about binding custom schemas, see Use Custom Schemas.
To unbind and rebind your app:
Unbind the app from the leader-follower service instance by running:
cf unbind-service APP-NAME SERVICE-INSTANCE-NAME
Where:
APP-NAME
: This is the name of the app bound to the leader-follower service instance.SERVICE-INSTANCE-NAME
: This is the name of the leader-follower service instance.
Rebind the app to the leader-follower service instance by running:
cf bind-service APP-NAME SERVICE-INSTANCE-NAME
Restage the app by running:
cf restage APP-NAME