LATEST VERSION: 1.9 - CHANGELOG
MySQL for PCF v1.9

Monitoring the MySQL Service

This document describes the metrics that are produced by MySQL for PCF, and everything you need to know about the Replication Canary. For information about the Interruptor, please see the Troubleshooting topic.

Metrics

MySQL emits a number of metrics that can be used to monitor the health and performance of the MySQL deployment. The PCF firehose exposes these metrics.

The metrics polling interval defaults to 30 seconds. This can be changed by navigating to the Advanced Options configuration pane and entering a new value in Metrics polling interval (min: 10).

Metrics Polling Interval

Third-party monitoring tools can consume MySQL for PCF’s metrics via a nozzle to monitor MySQL performance and health. For an example Datadog configuration that displays some of the significant metrics outlined below, see the CF Redis and MySQL example dashboards. Pivotal does not endorse or provide support for any third party solution.

MySQL-Specific Metrics

Data Source Description Metric Unit
/p-mysql/available Indicates if the local database server is available and responding. boolean
/p-mysql/innodb/buffer_pool_free The number of free pages in the InnoDB Buffer Pool. pages
/p-mysql/innodb/buffer_pool_total The total number of pages in the InnoDB Buffer Pool. pages
/p-mysql/innodb/buffer_pool_used The number of used pages in the InnoDB Buffer Pool. pages
/p-mysql/innodb/buffer_pool_utilization The utilization of the InnoDB Buffer Pool. fraction
/p-mysql/innodb/current_row_locks The number of current row locks. locks
/p-mysql/innodb/data_reads The rate of data reads. reads/second
/p-mysql/innodb/data_writes The rate of data writes. writes/second
/p-mysql/innodb/mutex_os_waits The rate of mutex OS waits. events/second
/p-mysql/innodb/mutex_spin_rounds The rate of mutex spin rounds. events/second
/p-mysql/innodb/mutex_spin_waits The rate of mutex spin waits. events/second
/p-mysql/innodb/os_log_fsyncs The rate of fsync writes to the log file. writes/second
/p-mysql/innodb/row_lock_time Time spent in acquiring row locks. milliseconds
/p-mysql/innodb/row_lock_waits The number of times per second a row lock had to be waited for. events/second
/p-mysql/net/connections The rate of connections to the server. connection/second
/p-mysql/net/max_connections The maximum number of connections that have been in use simultaneously since the server started. connections
/p-mysql/performance/com_delete The rate of delete statements. queries/second
/p-mysql/performance/com_delete_multi The rate of delete-multi statements. queries/second
/p-mysql/performance/com_insert The rate of insert statements. query/second
/p-mysql/performance/com_insert_select The rate of insert-select statements. queries/second
/p-mysql/performance/com_replace_select The rate of replace-select statements. queries/second
/p-mysql/performance/com_select The rate of select statements. queries/second
/p-mysql/performance/com_update The rate of update statements. queries/second
/p-mysql/performance/com_update_multi The rate of update-multi. queries/second
/p-mysql/performance/created_tmp_disk_tables The rate of internal on-disk temporary tables created by second by the server while executing statements. table/second
/p-mysql/performance/created_tmp_files The rate of temporary files created by second. files/second
/p-mysql/performance/created_tmp_tables The rate of internal temporary tables created by second by the server while executing statements. tables/second
/p-mysql/performance/kernel_time Percentage of CPU time spent in kernel space by MySQL. percent
/p-mysql/performance/key_cache_utilization The key cache utilization ratio. fraction
/p-mysql/performance/open_files The number of open files. files
/p-mysql/performance/open_tables The number of of tables that are open. tables
/p-mysql/performance/qcache_hits The rate of query cache hits. hits/second
/p-mysql/performance/questions The rate of statements executed by the server. queries/second
/p-mysql/performance/slow_queries The rate of slow queries. queries/second
/p-mysql/performance/table_locks_waited The total number of times that a request for a table lock could not be granted immediately and a wait was needed. number
/p-mysql/performance/threads_connected The number of currently open connections. connections
/p-mysql/performance/threads_running The number of threads that are not sleeping. threads
/p-mysql/performance/user_time Percentage of CPU time spent in user space by MySQL. percent
/p-mysql/performance/max_connections The maximum permitted number of simultaneous client connections. integer
/p-mysql/performance/open_files_limit The number of files that the operating system permits mysqld to open. integer
/p-mysql/performance/open_tables The number of tables that are open. integer
/p-mysql/performance/opened_tables The number of tables that have been opened. integer
/p-mysql/performance/opened_table_definitions The number of .frm files that have been cached. integer

Galera-Specific Metrics

Data Source Description Metric Unit
/p-mysql/galera/wsrep_ready Shows whether the node can accept queries. boolean
/p-mysql/galera/wsrep_cluster_size The current number of nodes in the Galera cluster. node
/p-mysql/galera/wsrep_cluster_status Shows the primary status of the cluster component that the node is in. State ID.
Values are Primary = 1, Non-primary = 0, Disconnected = -1 (See: https://mariadb.com/kb/en/mariadb/galera-cluster-status-variables/)
/p-mysql/galera/wsrep_local_recv_queue_avg Shows the average size of the local received queue since the last status query. float
/p-mysql/galera/wsrep_local_send_queue_avg Shows the average size of the local sent queue since the last status query. float
/p-mysql/galera/wsrep_local_index This node index in the cluster (base 0). int
/p-mysql/galera/wsrep_local_state This is the node’s local state int

For more information on monitoring PCF, see the Monitoring Pivotal Cloud Foundry topic.

Replication Canary

MySQL for Pivotal Cloud Foundry (PCF) is a clustered solution that uses replication to provide benefits such as quick failover and rolling upgrades. This is more complex than a single node system with no replication. MySQL for PCF includes a Replication Canary to help with the increased complexity. The Replication Canary is a long-running monitor that validates that replication is working within the MySQL cluster.

How it Works

The Replication Canary writes to a private dataset in the cluster, and attempts to read that data from each node. It pauses between writing and reading to ensure that the writesets have been committed across each node of the cluster. The private dataset does not use a significant amount of disk capacity.

When replication fails to work properly, the Canary detects that it cannot read the data from all nodes, and immediately takes two actions:

  • Emails a pre-configured address with a message that replication has failed. See the sample below.
  • Disables client access to the cluster.

Note: Malfunctioning replication exposes the cluster to the possibility of data loss. Because of this, both behaviors are enabled by default. It is critical that you contact Pivotal support immediately in the case of replication failure. Support will work with you to determine the nature of the cluster failure and provide guidance regarding a solution.

Sample Notification Email

If the Canary detects a replication failure, it immediately sends an email through the Elastic Runtime notification service. See the following example:

Subject: CF Notification: p-mysql Replication Canary, alert 417

This message was sent directly to your email address.

{alert-code 417}
This is an email to notify you that the MySQL service's replication canary has detected an unsafe cluster condition in which replication is not performing as expected across all nodes.

Cluster Access

Each time the Canary detects cluster replication failure, it instructs all proxies to disable connections to the database cluster. If the replication issue resolves, the Canary detects this and automatically restores client access to the cluster.

If you must restore access to the cluster regardless of the Replication Canary, contact Support.

Determine Proxy State

You can determine if the Canary disabled cluster access by using the Proxy API. See the following example:

ubuntu@ip-10-0-0-38:~$ curl -ku admin:PASSWORD_FROM_OPSMGR -X GET https://proxy-0-p-mysql.SYSTEM-DOMAIN/v0/cluster ; echo
{"currentBackendIndex":0,"trafficEnabled":false,"message":"Disabling cluster traffic","lastUpdated":"2016-07-27T05:16:29.197754077Z"}

Enable the Replication Canary

To enable the Replication Canary, follow the instructions below to configure both the Elastic Runtime tile and the MySQL for PCF tile.

Configure the Elastic Runtime Tile

Note: In a typical PCF deployment, these settings are already configured.

  1. In the SMTP Config section, enter a From Email that the Replication Canary can use to send notifications, along with the SMTP server configuration.
  2. In the Errands section, select the Notifications errand.

Configure the MySQL for PCF Tile

  1. In the Advanced Options section, select Enable replication canary. Enable Replication
  2. If you want to the Replication Canary to send email but not disable access at the proxy, select Notify only.

    Note: Pivotal recommends leaving this checkbox unselected due to the possibility of data loss from replication failure.

  3. You can override the Replication canary time period. The Replication canary time period sets how frequently the canary checks for replication failure, in seconds. This adds a small amount of load to the databases, but the canary reacts more quickly to replication failure. The default is 30 seconds.

    Canary time

  4. You can override the Replication canary read delay. The Replication canary read delay sets how long the canary waits to verify data is replicating across each MySQL node, in seconds. Clusters under heavy load experience some small replication lag as writesets are committed across the nodes. The Default is 20 seconds.

  5. Enter an Email address to receive monitoring notifications. Use a closely monitored email address account. The purpose of the Canary is to escalate replication failure as quickly as possible.

  6. In the Resource Config section, ensure the Monitoring job has one instance. Resource config

Disable the Replication Canary

If you do not need the Replication Canary, for instance if you use a single MySQL node, follow this procedure to disable both the job and the resource configuration.

  1. In the Advanced Options section of the MySQL for PCF tile, select Disable Replication Canary. Disable protection

  2. In the Resource Config pane, set the Monitoring job to zero instances. Monitoring zero

Create a pull request or raise an issue on the source for this page in GitHub