LATEST VERSION: 2.3 - RELEASE NOTES

Using Leader-Follower Topology for Availability

This topic describes how to use the leader-follower topology for the MySQL service and how to trigger a failover of apps from the leader to the follower.

You can maintain availability of your MySQL databases by keeping two copies of your database, one on a leader (master) instance and another on a follower (slave) instance. If the leader instance fails due to hardware or software issues or during a period of planned maintenance, you can initiate a failover of apps to the follower instance. This helps minimize downtime for apps that bind to your MySQL databases.

For more information, see:

Leader-Follower Setup and Failover Process

The following are the high-level steps for setting up and using a leader-follower topology, and for initiating a failover.

Leader-Follower Setup

  1. The operator configures a leader-follower plan. See Configure a Leader-Follower Service Plan below.

  2. Developers create an instance from the leader-follower plan. Doing this automatically deploys two MySQL VMs in a leader-follower topology. See Create a Service Instance.

  3. Developers bind apps to the leader in the leader-follower plan instance, using the leader’s IP address. Data is written to the leader and asynchronously replicated on the follower. See Bind a Service Instance to Your App.

Failover Process

  1. Promotion: If the leader instance fails or the operator wants to take the leader offline for maintenance, the operator promotes the follower instance to the new leader.

  2. Failover: After a promotion, the developer unbinds the app from the leader-follower service instance and then rebinds it in order to fail the app over to the new leader.

See Trigger a Failover below.

Configure a Leader-Follower Service Plan

You can configure up to five leader-follower service plans.

After you configure the leader-follower plan, it is available in the Marketplace. When a developer creates a leader-follower service instance, a leader VM is automatically deployed in one availability zone (AZ), and a follower VM is deployed in another AZ.

To configure a leader-follower service plan, do the following:

  1. Follow the steps for configuring an active service plan, and do the following specifically for the leader-follower plan:
    1. Click the tab for any of the five available service plans: Plan 1 to Plan 5.
    2. Select the Multi-node deployment checkbox. multi node checkbox
    3. Select a MySQL VM Type and a MySQL Persistent Disk.
    4. Select two MySQL Availability Zones to use for the MySQL VMs.
    5. Click Save.
  2. In the Ops Manager Installation Dashboard, click Apply Changes.

Monitor Leader-Follower Instances

In order to decide if a failover is needed, you must monitor leader-follower instances.

In monitoring the leader-follower VMs, pay attention to the following metrics:

  • /p.mysql/available: This metric records whether the MySQL VMs are responding to requests. Values are either 1 (available) or 0 (not available).

  • /p.mysql/follower_seconds_since_leader_heartbeat: Whenever the leader VM emits a heartbeat, the heartbeat is written to the leader database and replicated to the follower. This metric measures the number of seconds that elapses between the leader heartbeat and the replication of the heartbeat in the follower database, so you can determine how far behind the follower is from the leader. Normal values for this metric depend on your apps.

  • /p.mysql/follower_seconds_behind_master: The follower VM copies database writes from the leader VM, but takes time to apply them to its own database. This metric measures how far behind the follower VM is in applying these writes. For example, a follower VM may have copied writes from the leader VM that are timestamped up to 4:00pm, but it has only applied writes up to 1:00pm. Normal values for this metric depend on your apps.

For more information, see KPIs for MySQL Service Instances in Monitoring and KPIs.

Errands Used in Leader-Follower Failover

This section describes the errands used in the Trigger a Failover procedures below. You can also use these errands to create custom failover scripts.

Operators use the BOSH CLI to run the errands on the deployment for a particular leader-follower service instance.

The errands used in the failover procedures are:

  • configure-leader-follower:

    • Configures replication on the follower and ensures the leader is writable
    • Runs after every create or update of a leader-follower instance
    • Fails and alerts operators, via BOSH errand output, if the service instance is in a bad state
  • make-read-only:

    • Demotes a leader VM to a follower
    • Sets the VM to read-only, and guarantees that apps no longer write to this VM
    • Relays all in-flight transactions on this former leader VM to the follower VM, if it is accessible
  • make-leader:

    • Promotes a follower VM to a leader
    • Removes replication configuration from the VM, waits for all transactions to be applied to the VM, and sets the VM as writable
    • Fails if the original leader is still writeable to protect against data divergence

Trigger a Failover

You might want to trigger a failover in the following scenarios:

  • You want to take the leader VM down to do planned maintenance.
  • The performance of the leader VM degrades.
  • The leader VM fails unexpectedly.
  • The AZ where the leader VM is located goes offline unexpectedly.

To trigger a failover, follow these procedures:

Retrieve Information

Perform the following steps to retrieve the information necessary for stopping the leader and promoting the follower:

  1. Target the Cloud Controller of your PCF deployment with the Cloud Foundry Command Line Interface (cf CLI). For example:
    $ cf api api.pcf.example.com
  2. Log in:
    cf login
  3. Enter the org and space where the leader-follower service instance is located.
  4. Retrieve the GUID of the service instance by running the following command:
    cf service SERVICE-INSTANCE --guid

    Where SERVICE-INSTANCE is the name of the leader-follower service instance.

    For example:
    $ cf service my-lf-instance --guid
    82ddc607-710a-404e-b1b8-a7e3ea7ec063
    
    If you do not know the name of the service instance, you can list service instances in the space with cf services.
  5. Perform the steps in Gather Credential and IP Address Information and SSH into Ops Manager of Advanced Troubleshooting with the BOSH CLI to SSH into the Ops Manager VM.
  6. From the Ops Manager VM, log in to your BOSH Director with the BOSH CLI. See Log in to the BOSH Director in Advanced Troubleshooting with the BOSH CLI.
  7. Use the BOSH CLI to run the inspect errand with the following command:
    bosh2 -d service-instance_GUID run-errand inspect

    Where GUID is the GUID of the leader-follower service instance retrieved above.

    For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
      run-errand inspect
    
  8. Examine the output and locate the information about the leader-follower MySQL VMs:
    Instance   mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0
    Exit Code  0
    Stdout     2018/04/03 18:08:46 Started executing command: inspect
            2018/04/03 18:08:46
            IP Address: 10.0.8.11
            Role: leader
            Read Only: false
            Replication Configured: false
            Replication Mode: async
            Has Data: true
            GTID Executed: 82ddc607-710a-404e-b1b8-a7e3ea7ec063:1-18
            2018/04/03 18:08:46 Successfully executed command: inspect
    Stderr     -
    
    Instance mysql/37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8 Exit Code 0 Stdout 2018/04/03 18:08:46 Started executing command: inspect 2018/04/03 18:08:46 IP Address: 10.0.8.10 Role: follower Read Only: true Replication Configured: true Replication Mode: async Has Data: true GTID Executed: 82ddc607-710a-404e-b1b8-a7e3ea7ec063:1-18 2018/04/03 18:08:46 Successfully executed command: inspect
  9. Identify the instance marked Role: leader. This is the leader VM. Record the index, which is the value for Instance after mysql/. In the above output, the index of the leader VM is ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0.
  10. Record the index of the other instance, which is the follower VM. In the above output, the index of the follower VM is 37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8.
  11. If you still have access to the AZ where the leader VM is located, use the BOSH CLI to determine if the leader VM is in the AZ you want to take offline. For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
    instances
    Deployment 'service-instance_f378ec82-61a4-4e66-8ed9-889c7cf5342f'
    
    Instance Process State AZ IPs mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0 failing us-central1-f 10.0.8.11 mysql/37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8 running us-central1-a 10.0.8.10 2 instances

    Note: The leader VM might not display its status as failing if you are performing planned maintenance.

    Examine the output to determine if the leader VM is in the AZ you want to take offline.

Promote the Follower

Perform the following steps to stop the leader VM and promote the follower VM to the new leader:

  1. Stop any data from being written to the leader VM by setting it to read only. Run the following command:
    bosh2 -d service-instance_GUID \
    run-errand make-read-only \
    --instance=mysql/INDEX
    

    Where:
    • GUID: This is the GUID of the leader-follower service instance retrieved above.
    • INDEX: This is the index of the leader VM retrieved above.
    For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
      run-errand make-read-only \
      --instance=mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0
    
  2. If you still have access to the AZ where the leader VM is located, stop the leader VM, using the same values as above. For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
      stop mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0
  3. Set the follower VM as writable:
    • If you are using PCF v2.0 and above, run the make-leader errand, using the index of the follower VM retrieved above. For example:
      $ bosh2 -d service-instance_82dc607-710a-404e-b1b8-a7e3ea7ec063 \
      run-errand make-leader \
      --instance=mysql/37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8
      
      If this command returns an error, re-run it until the follower VM has completed applying the transactions.
    • If you are using PCF v1.12, use bosh2 ssh to run the errand, using the index of the follower VM retrieved above. For example:
      $ bosh2 -d service-instance_82dc607-710a-404e-b1b8-a7e3ea7ec063 \
      ssh \
      mysql/37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8 \
      -c "sudo /var/vcap/jobs/make-leader/bin/run"
      

At this point, a single instance is working but leader-follower replication has not yet been restored. If you want to fail your app over to a single instance instead of restoring leader-follower, skip to Unbind and Rebind the App.

If you are triggering a failover in response to the AZ of the leader VM going offline, you can fail your app over to a single instance by performing the steps in Unbind and Rebind the App. But in order to restore leader-follower, you must regain access to the AZ where your leader VM is located before performing the steps in Clean Up Former Leader VM (Optional) and Configure the New Follower.

Clean Up Former Leader VM (Optional)

If you are triggering a failover in response to a failing leader VM, perform these steps to clean up the former leader VM:

  1. Disable resurrection, specifying the same deployment as above. For example:
    $ bosh2 update-resurrection off
  2. Use the BOSH CLI to retrieve the CID of the failing former leader VM. For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 instances \
      --details \
      --failing \
      --column=”VM CID” \
      --json
    
  3. Use the BOSH CLI to retrieve the disk CID of the failing former leader VM. For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 instances \
      --details \
      --failing \
      --column=”Disk CIDs” \
      --json
    
  4. Delete the failing former leader VM. Run the following command:
    bosh2 -d service-instance_GUID delete-vm vm-CID

    Where:
    • GUID: This is the GUID of the leader-follower service instance retrieved above.
    • CID: This is the CID of the failing former leader VM retrieved above.
    For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
      delete-vm i-1db9ede6
    
  5. Orphan the disk of the failing former leader VM. Run the following command:
    bosh2 -d service-instance_GUID orphan-disk DISK-CID
    

    Where:
    • GUID: This is the GUID of the leader-follower service instance retrieved above.
    • DISK-CID: This is the disk CID of the failing former leader VM retrieved above.
    For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
      orphan-disk b-1db9ede6
    
    Orphaning a disk rather than deleting it preserves it for possible recovery. After performing recovery operations, you can reattach the disk to a VM. BOSH deletes orphaned disks after five days by default.

Configure the New Follower

Perform the following steps to restart the former leader VM and configure it as the new follower:

  1. Recreate the former leader VM. Run the following command:
    bosh2 -d service-instance_GUID \
    recreate \
    mysql/INDEX
    

    Where:
    • GUID: This is the GUID of the leader-follower service instance retrieved above.
    • INDEX: This is the index of the former leader VM that you are recreating.
    For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
     recreate \
     mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd01. 
  2. Use the configure-leader-follower errand to set the former leader VM as a follower, using the same values as above. For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
      run-errand configure-leader-follower \
      --instance=mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0
  3. Use the BOSH CLI to run the inspect errand on the deployment, using the same value as above. For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \
       run-errand inspect
    
    If the output displays one instance marked Role: leader and another instance marked Role: follower, then leader-follower replication and high availability are resumed. The deployment should be in its original, working state. You may turn resurrection back on if desired.

Unbind and Rebind the App

To fail their apps over to the new leader VM, your developers must perform the following steps to bind and rebind their apps to the leader-follower service instance:

  1. Unbind the app from the leader-follower service instance. Run the following command:
    cf unbind-service APP SERVICE-INSTANCE

    Where:
    • APP: This is the name of the app bound to the leader-follower service instance.
    • SERVICE-INSTANCE: This is the name of the leader-follower service instance.
    For example:
    $ cf unbind-service my-app my-lf-instance
    
  2. Rebind the app to the leader-follower service instance, using the same values as above. For example:
    $ cf bind-service my-app my-lf-instance
    
  3. Restage the app. For example:
    $ cf restage my-app
    
Create a pull request or raise an issue on the source for this page in GitHub