LATEST VERSION: 2.1 - CHANGELOG

Events that Interrupt Service

This topic explains events in the lifecycle of a MySQL for Pivotal Cloud Foundry (PCF) service instance that may cause temporary service interruptions.

Stemcell or Service Update

An operator updates a stemcell version or their version of MySQL for PCF.

  • Impact: Apps lose access to the MySQL service while PCF updates the service instance they are bound to. The service should resume within 10-15 minutes.
  • Required Actions: None. If the update deploys successfully, apps reconnect automatically.

Plan Change

A developer changes their service instance to provide a different service plan, using cf update-service or Apps Manager.

  • Impact: Apps lose access to the MySQL service while PCF updates the service instance they are bound to. The service should resume within 10-15 minutes.
  • Required Actions: None. If the plan change deploys successfully, apps reconnect automatically.

VM Process Failure

A process, like the MySQL server, crashes on the service instance VM.

  • Impact:
    • BOSH (monit) brings the process back automatically.
    • Depending on the process and what it was doing, the service may experience 60-120 seconds of downtime.
    • Until the process resumes, apps may be unable to use MySQL, metrics or logging may stop, and other features may be interrupted.
  • Required Actions: None. If the process resumes cleanly and without manual intervention, apps reconnect automatically.

VM Failure

A MySQL for PCF VM fails and goes offline due to either a virtualization problem or a host hardware problem.

  • Impact:
    • If the BOSH Resurrector is enabled (recommended), BOSH should detect the failure, recreate the VM, and reattach the same persistent disk and IP address.
    • Downtime largely depends on how quickly the Resurrector notices, usually 1-2 minutes, and how long it takes the IaaS to create a replacement VM.
    • If the Resurrector is not enabled, some IaaSes such as vSphere have similar resurrection or HA features.
    • Apps cannot connect to MySQL until the VM is recreated and the My SQL server process is resumed.
    • Based on prior experience with BOSH Resurrector, typical downtime is 8-10 minutes.
  • Required Actions: When the VM comes back, no further action should be required for the app developer to continue operations.

AZ Failure

An Availability Zone (AZ) goes offline entirely or loses connectivity to other AZs (net split). This causes service interruption in multi-AZ PCF deployments where Diego has placed multiple instances of a MySQL-using app in different AZs.

  • Impact:
    • Some app instances may still be able to connect and continue operating.
    • App instances in the other AZs will not be able to connect.
    • Downtime: Unknown
  • Required Actions: Recovery of the app / database connection should be automatic. Depending on the app, manual intervention may be required to check data consistency.

Region Failure

  • Example: An entire region fails, bringing PCF platform components offline.

  • Impact:

    • The entire PCF platform needs to be brought back up manually.
    • Downtime: Unknown
  • Required Actions: Each service instance may need to be restored individually depending on the restored state of the platform.

Create a pull request or raise an issue on the source for this page in GitHub