LATEST VERSION: 2.2 - CHANGELOG

Using Leader-Follower Topology for Availability

You can maintain availability of your MySQL databases by keeping two copies of your database, one on a leader (master) instance and another on a follower (slave) instance. If the leader instance fails due to hardware or software issues or during a period of planned maintenance, you can initiate a failover to the follower instance, so that the apps that bind to your MySQL databases experience no downtime.

For more information, see:

Leader-Follower Setup and Failover Process

The following are the high-level steps for setting up and using the leader-follower topology:

  1. The operator configures a leader-follower plan.

  2. Developers create an instance from the leader-follower plan. Doing this automatically deploys two MySQL VMs in a leader-follower topology.

  3. Developers bind apps to the leader in leader-follower plan instance, using the leader’s IP address. Data is written to the leader and asynchronously replicated on the follower.

  4. If the leader instance fails, the operator manually triggers a failover, so the working instance (follower) becomes the new leader.

  5. The operator unbinds the app from the leader-follower plan and then rebinds it in order to use the new leader.

Configure a Leader-Follower Service Plan

You can configure up to five leader-follower service plans.

After you configure the leader-follower plan, it is available in the Marketplace. When a developer creates a leader-follower service instance, a leader VM is automatically deployed in one availability zone (AZ), and a follower VM is deployed in another AZ.

To configure a leader-follower service plan, do the following:

  1. Follow the steps for configuring an active service plan, and do the following specifically for the leader-follower plan:
    1. Click the tab for any of the five available service plans: Plan 1 to Plan 5.
    2. Select the Multi-node deployment checkbox. multi node checkbox
    3. Select a MySQL VM Type and a MySQL Persistent Disk.
    4. Select two MySQL Availability Zones to use for the MySQL VMs.
    5. Click Save.
  2. In the Ops Manager Installation Dashboard, click Apply Changes.

Monitor Leader-Follower Instances

In order to decide if a failover is needed, you must monitor leader-follower instances.

In monitoring the leader-follower VMs, pay attention to the following metrics:

  • /p.mysql/available: This metric records whether the MySQL VMs are responding to requests. Values are either 1 (available) or 0 (not available).

  • /p.mysql/follower_seconds_since_leader_heartbeat: Whenever the leader VM emits a heartbeat, the heartbeat is written to the leader database and replicated to the follower. This metric measures the number of seconds that elapses between the leader heartbeat and the replication of the heartbeat in the follower database, so you can determine how far behind the follower is from the leader. Normal values for this metric depend on your apps.

  • /p.mysql/follower_seconds_behind_master: The follower VM copies database writes from the leader VM, but takes time to apply them to its own database. This metric measures how far behind the follower VM is in applying these writes. For example, a follower VM may have copied writes from the leader VM that are timestamped up to 4:00pm, but it has only applied writes up to 1:00pm. Normal values for this metric depend on your apps.

For more information, see KPIs for MySQL Service Instances in Monitoring and KPIs.

Trigger a Failover

You might want to trigger a failover to the follower VM in the following scenarios:

  • The leader VM fails
  • The performance of the leader VM degrades
  • You want to take the leader VM down to do planned maintenance

There are two different scenarios for triggering a failover:

Scenario 1: You no longer have access to the leader VM.
To trigger a failover if you cannot access the leader VM, follow these procedures:

Scenario 2: You still have access to the leader VM.
To trigger a failover when you can access the leader VM, follow these procedures:

Scenario 1: You No Longer Have Access to the Leader VM

If you cannot access the leader VM, do the following procedures to trigger a failover.

Retrieve Information

Perform the following steps to retrieve the information necessary for stopping the leader and promoting the follower:

  1. Target the Cloud Controller of your PCF deployment with the Cloud Foundry Command Line Interface (cf CLI). For example:
    $ cf api api.pcf.example.com
  2. Log in:
    cf login
  3. Enter the org and space where the leader-follower service instance is located.
  4. Retrieve the GUID of the service instance by running the following command:
    cf service SERVICE-INSTANCE --guid

    Where SERVICE-INSTANCE is the name of the leader-follower service instance.

    For example:
    $ cf service my-lf-instance --guid
    82ddc607-710a-404e-b1b8-a7e3ea7ec063
    
    If you do not know the name of the service instance, you can list service instances in the space with cf services.
  5. Perform the steps in Gather Credential and IP Address Information and SSH into Ops Manager of Advanced Troubleshooting with the BOSH CLI to SSH into the Ops Manager VM.
  6. From the Ops Manager VM, log in to your BOSH Director with the BOSH CLI v2. See Log in to the BOSH Director in Advanced Troubleshooting with the BOSH CLI.
  7. Use the BOSH CLI v2 to run the inspect errand with the following command:
    bosh2 -d service-instance_GUID run-errand inspect

    Where GUID is the GUID of the leader-follower service instance retrieved above.

    For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 run-errand inspect
    
  8. Examine the output and locate the information about the leader-follower MySQL VMs:
    Instance   mysql/37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8
    Exit Code  0
    Stdout     -
    Stderr     2017/12/13 22:26:56 Started executing command: inspect
             2017/12/13 22:26:56 Started GET https://127.0.0.1:8443/status
             2017/12/13 22:26:56
             Has Data: true
             Read Only: true
             GTID Executed: 524c4be4-d540-11e7-a1d2-42010a000805:1-85297
             Replication Configured: true
    Instance   mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0
    Exit Code  0
    Stdout     -
    Stderr     2017/12/13 22:26:56 Started executing command: inspect
             2017/12/13 22:26:56 Started GET https://127.0.0.1:8443/status
             2017/12/13 22:26:57
             Has Data: true
             Read Only: false
             GTID Executed: 524c4be4-d540-11e7-a1d2-42010a000805:1-85298
             Replication Configured: false
    
  9. Identify the instance marked Read Only: false. This is the leader VM. Record the index, which is the value for Instance after mysql/. In the above output, the index of the leader VM is ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0.
  10. Record the index of the other instance, which is the follower VM. In the above output, the index of the follower VM is 37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8.

Promote the Follower

Perform the following steps to stop the leader VM and promote the follower VM to the new leader:

  1. Stop the leader VM, using the same values as above. For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 stop mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0
  2. Set the follower VM as writable, using the index of the follower VM retrieved above. For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 run-errand make-leader --instance=mysql/37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8
    
    If this command returns an error, re-run it until the follower VM has completed applying the transactions.
  3. Unbind the app from the leader-follower service instance. Run the following command:
    cf unbind-service APP SERVICE-INSTANCE

    Where:
    • APP: This is the name of the app bound to the leader-follower service instance.
    • SERVICE-INSTANCE: This is the name of the leader-follower service instance.
    For example:
    $ cf unbind-service my-app my-lf-instance
    
  4. Rebind the app to the leader-follower service instance, using the same values as above. For example:
    $ cf bind-service my-app my-lf-instance
    
  5. Restage the app. For example:
    $ cf restage my-app
    

Scenario 2: You Still Have Access to the Leader VM

If you can access the leader VM, do the following procedures to trigger a failover.

Retrieve Information

Perform the following steps to retrieve the information necessary for stopping the leader and promoting the follower:

  1. Target the Cloud Controller of your PCF deployment with the Cloud Foundry Command Line Interface (cf CLI). For example:
    $ cf api api.pcf.example.com
  2. Log in:
    cf login
  3. Enter the org and space where the leader-follower service instance is located.
  4. Retrieve the GUID of the service instance by running the following command:
    cf service SERVICE-INSTANCE --guid

    Where SERVICE-INSTANCE is the name of the leader-follower service instance.

    For example:
    $ cf service my-lf-instance --guid
    82ddc607-710a-404e-b1b8-a7e3ea7ec063
    
    If you do not know the name of the service instance, you can list service instances in the space with cf services.
  5. Perform the steps in Gather Credential and IP Address Information and SSH into Ops Manager of Advanced Troubleshooting with the BOSH CLI to SSH into the Ops Manager VM.
  6. From the Ops Manager VM, log in to your BOSH Director with the BOSH CLI v2. See Log in to the BOSH Director in Advanced Troubleshooting with the BOSH CLI.
  7. Use the BOSH CLI v2 to run the inspect errand with the following command:
    bosh2 -d service-instance_GUID run-errand inspect

    Where GUID is the GUID of the leader-follower service instance retrieved above.

    For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 run-errand inspect
    
  8. Examine the output and locate the information about the leader-follower MySQL VMs:
    Instance   mysql/37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8
    Exit Code  0
    Stdout     -
    Stderr     2017/12/13 22:26:56 Started executing command: inspect
             2017/12/13 22:26:56 Started GET https://127.0.0.1:8443/status
             2017/12/13 22:26:56
             Has Data: true
             Read Only: true
             GTID Executed: 524c4be4-d540-11e7-a1d2-42010a000805:1-85297
             Replication Configured: true
    Instance   mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0
    Exit Code  0
    Stdout     -
    Stderr     2017/12/13 22:26:56 Started executing command: inspect
             2017/12/13 22:26:56 Started GET https://127.0.0.1:8443/status
             2017/12/13 22:26:57
             Has Data: true
             Read Only: false
             GTID Executed: 524c4be4-d540-11e7-a1d2-42010a000805:1-85298
             Replication Configured: false
    
  9. Identify the instance marked Read Only: false. This is the leader VM. Record the index, which is the value for Instance after mysql/. In the above output, the index of the leader VM is ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0.
  10. Record the index of the other instance, which is the follower VM. In the above output, the index of the follower VM is 37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8.
  11. Use the BOSH CLI v2 to determine if the leader VM is in the AZ you want to take offline. For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 instances
    Examine the output to determine if the leader VM you identified in the previous step is in the AZ you want to take offline.

Promote the Follower

Perform the following steps to stop the leader VM and promote the follower VM to the new leader:

  1. Stop any data from being written to the leader VM by setting it to read only. Run the following command:
    bosh2 -d service-instance_GUID run-errand make-read-only --instance=mysql/INDEX

    Where:
    • GUID: This is the GUID of the leader-follower service instance retrieved above.
    • INDEX: This is the index of the leader VM retrieved above.
    For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 run-errand make-read-only --instance=mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0
    
  2. Stop the leader VM, using the same values as above. For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 stop mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0
  3. Set the follower VM as writable, using the index of the follower VM retrieved above. For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 run-errand make-leader --instance=mysql/37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8
    
    If this command returns an error, re-run it until the follower VM has completed applying the transactions.
  4. Unbind the app from the leader-follower service instance. Run the following command:
    cf unbind-service APP SERVICE-INSTANCE

    Where:
    • APP: This is the name of the app bound to the leader-follower service instance.
    • SERVICE-INSTANCE: This is the name of the leader-follower service instance.
    For example:
    $ cf unbind-service my-app my-lf-instance
    
  5. Rebind the app to the leader-follower service instance, using the same values as above. For example:
    $ cf bind-service my-app my-lf-instance
    
  6. Restage the app. For example:
    $ cf restage my-app
    

Clean Up Former Leader VM

If you still have access to the former leader VM, perform these steps to clean up that VM:

  1. Disable resurrection, specifying the same deployment as above. For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 update-resurrection off
  2. Use the BOSH CLI v2 to retrieve the CID of the failing former leader VM. For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 instances \
    --details 
    --failing
    --column=”VM CID”
    --json
    
  3. Use the BOSH CLI v2 to retrieve the disk CID of the failing former leader VM. For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 instances \
    --details 
    --failing
    --column=”Disk CIDs”
    --json
    
  4. Delete the failing former leader VM. Run the following command:
    bosh2 -d service-instance_GUID delete-vm vm-CID

    Where:
    • GUID: This is the GUID of the leader-follower service instance retrieved above.
    • CID: This is the CID of the failing former leader VM retrieved above.
    For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 delete-vm i-1db9ede6
    
  5. Delete the disk of the failing former leader VM. Run the following commands:
    bosh2 -d service-instance_GUID delete-disk DISK-CID
    bosh2 -d service-instance_GUID orphan-disk DISK-CID

    Where:
    • GUID: This is the GUID of the leader-follower service instance retrieved above.
    • DISK-CID: This is the disk CID of the failing former leader VM retrieved above.
    For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 delete-disk b-1db9ede6
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 orphan-disk b-1db9ede6
    
  6. Download the manifest of the leader-follower deployment. For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 manifest > mysql-manifest.yml
  7. Redeploy to create the missing VM and disk in a fresh state. For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 deploy mysql-manifest.yml
  8. Use the configure-leader-follower errand set the former leader VM as a follower, using the same values as above. For example:
    $ bosh2 -d service-instance_82ddc607-710a-404e-b1b8-a7e3ea7ec063 run-errand configure-leader-follower --instance=mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0
    Replication and high availability are resumed. The deployment should be in its original, working state. You may turn resurrection back on if desired.

Leader-Follower Errands

MySQL for PCF provides errands that operators can use to control the lifecycle of a leader-follower service instance without being an expert at MySQL. These errands are:

  • make-read-only:

    • Guarantees that apps do not write to a former leader VM
    • Sets the VM to read-only and ensures that, if the follower is accessible, all transactions have been relayed to the follower
  • make-leader:

    • Promotes a follower VM to a leader
    • Removes replication configuration from a follower VM, waits for all transactions to be applied to the VM, and sets the VM as writable
    • Fails if the original leader is still accessible to protect against data divergence
  • configure-leader-follower:

    • Configures replication on the follower and ensures the leader is writable
    • Runs after every create or update of a leader-follower instance
    • Fails and alerts operators, via BOSH errand output, if the service instance is in a bad state

Using the errands above, operators can create failover scripts.

Create a pull request or raise an issue on the source for this page in GitHub