LATEST VERSION: 2.1 - CHANGELOG

Monitoring and KPIs for On-Demand MySQL for PCF

This topic explains how to monitor the health of the MySQL for Pivotal Cloud Foundry (PCF) service using the logs, metrics, and Key Performance Indicators (KPIs) generated by MySQL for PCF component VMs.

For general information about logging and metrics in PCF, see Logging and Metrics.

About Metrics

Metrics are regularly-generated log messages that report measured component states. The metrics polling interval is 30 seconds for MySQL instances and 60 seconds for the service broker.

Metrics are long, single lines of text that follow the format:

origin:"p-mysql" eventType:ValueMetric timestamp:1496776477930669688 deployment:"service-instance_2b5a001f-2bf3-460c-aee6-fd2253f9fb0c" job:"mysql" index:"b09df494-b731-4d06-a4b0-c2985ceedf4c" ip:"10.0.8.4" valueMetric:<name:"/p-mysql/performance/open_files" value:24 unit:"file" >

Key Performance Indicators

Key Performance Indicators (KPIs) for MySQL for PCF are metrics that operators find most useful for monitoring their MySQL service to ensure smooth operation. KPIs are high-signal-value metrics that can indicate emerging issues. KPIs can be raw component metrics or derived metrics generated by applying formulas to raw metrics.

Pivotal provides the following KPIs as general alerting and response guidance for typical MySQL for PCF installations. Pivotal recommends that operators continue to fine-tune the alert measures to their installation by observing historical trends. Pivotal also recommends that operators expand beyond this guidance and create new, installation-specific monitoring metrics, thresholds, and alerts based on learning from their own installations.

KPIs for MySQL Service Instances

This section lists the KPIs that are specific for MySQL for PCF instances.

For a list of general KPIs that apply to all instances, and not specifically to MySQL for PCF instances, see BOSH System Metrics.

For a list of all MySQL for PCF component metrics, see All MySQL Metrics.

Server Availability


/p-mysql/available

Description MySQL Server is currently responding to requests, which indicates if the component is available.

Use: If the server does not emit heartbeats, it is offline.

Origin: Doppler/Firehose
Type: boolean
Frequency: 30 s
Recommended measurement Average over last 5 minutes
Recommended alert thresholds Yellow warning: N/A
Red critical: < 1
Recommended response Check the MySQL Server logs for errors. You can find the instance by targeting your MySQL deployment with BOSH and inspecting logs for the instance. For more information, see Failing Jobs and Unhealthy Instances.

Connections


/p-mysql/net/connections

Description The rate of connections to the server, shown as connections per second.

Use: If the number of connections drastically changes or if apps are unable to connect, there might be a network or app issue.

Origin: Doppler/Firehose
Type: count
Frequency: 30 s
Recommended measurement (Number of connections / Max connections) over last 1 minute
Recommended alert thresholds Yellow warning: > 80
Red critical: > 90
Recommended response When approaching 100% of max connections, apps may be experiencing times when they cannot connect to the database. The connections/second for a service instance vary based on application instances and app utilization. If this metric is met or exceeded for an extended period of time, monitor app usage to ensure everything is behaving as expected.

Questions


/p-mysql/performance/questions

Description The rate of statements executed by the server, shown as queries per second.

Use: The server should always be processing some queries, if just as part of the internal automation.

Origin: Doppler/Firehose
Type: count
Frequency: 30 s
Recommended measurement Average over last 2 minutes
Recommended alert thresholds Yellow warning: 0 for 90 s
Red critical: 0 for 120 s
Recommended response Investigate the MySQL server logs, such as the audit log, to understand why query rate changed and determine appropriate action.

Busy Time


/p-mysql/performance/busy_time

Description Percentage of CPU time spent by MySQL on user activity, executing user code, as opposed to kernel activity, executing system functions.

Use: This closely reflects the amount of server activity dedicated to app queries.

Origin: Doppler/Firehose
Type: percentage
Frequency: 30 s
Recommended measurement Average over last 2 minutes
Recommended alert thresholds Yellow warning: > 80%
Red critical: > 90%
Recommended response If this metric meets or exceeds the recommended thresholds for extended periods of time, run `SHOW PROCESSLIST` and identify which queries are using so much CPU. Optionally scale to a larger service plan with more CPU capacity.

BOSH System Metrics

All BOSH-deployed components generate the following system metrics; these system metrics also serve as KPIs for the MySQL for PCF service.

Persistent Disk


persistent.disk.percent

Description Persistent disk being consumed by the MySQL service instance.

Use: If the persistent disk fills up, MySQL will be unable to process queries and recovery is difficult.

Origin: JMX Bridge or BOSH HM
Type: percent
Frequency: 60 s (default)
Recommended measurement Average over last 10 minutes
Recommended alert thresholds Yellow warning: > 75
Red critical: > 90
Recommended response Update the service instance to use a plan with a larger persistent disk. This process may take some time, as the data is copied from the original persistent disk to a new one.

RAM


system.mem.percent

Description RAM being consumed by the MySQL service instance.

Use: MySQL increases its memory usage as the data set increases. This is normal, as much of that RAM is used to buffer IO. As long as there is enough remaining RAM for other processes on the instance, the MySQL server should be OK.

Origin: JMX Bridge or BOSH HM
Type: percentage
Frequency: 60 s (default)
Recommended measurement Average over last 10 minutes
Recommended alert thresholds Yellow warning: > 95
Red critical: > 99
Recommended response Update the service instance to a plan with more RAM.

CPU


system.cpu.percent

Description CPU time being consumed by the MySQL service.

Use: A node that experiences context switching or high CPU usage will become unresponsive. This also affects the ability of the node to report metrics.

Origin: JMX Bridge or BOSH HM
Type: percent
Frequency: 60 s (default)
Recommended measurement Average over last 10 minutes
Recommended alert thresholds Yellow warning: > 80
Red critical: > 90
Recommended response Determine what is using so much CPU. If it is from normal processes, update the service instance to use a plan with larger CPU capacity.

All MySQL Metrics

In addition to the above KPIs, the MySQL service emits the followings metrics that can be used for monitoring and alerting.

Data Source Description Metric Unit
/p-mysql/available Indicates if the local database server is available and responding. boolean
/p-mysql/innodb/buffer_pool_free The number of free pages in the InnoDB Buffer Pool. pages
/p-mysql/innodb/buffer_pool_total The total number of pages in the InnoDB Buffer Pool. pages
/p-mysql/innodb/buffer_pool_used The number of used pages in the InnoDB Buffer Pool. pages
/p-mysql/innodb/buffer_pool_utilization The utilization of the InnoDB Buffer Pool. fraction
/p-mysql/innodb/current_row_locks The number of current row locks. locks
/p-mysql/innodb/data_reads The rate of data reads. reads/second
/p-mysql/innodb/data_writes The rate of data writes. writes/second
/p-mysql/innodb/mutex_os_waits The rate of mutex OS waits. events/second
/p-mysql/innodb/mutex_spin_rounds The rate of mutex spin rounds. events/second
/p-mysql/innodb/mutex_spin_waits The rate of mutex spin waits. events/second
/p-mysql/innodb/os_log_fsyncs The rate of fsync writes to the log file. writes/second
/p-mysql/innodb/row_lock_time Time spent in acquiring row locks. milliseconds
/p-mysql/innodb/row_lock_waits The number of times per second a row lock had to be waited for. events/second
/p-mysql/net/connections The rate of connections to the server. connection/second
/p-mysql/net/max_connections The maximum number of connections that have been in use simultaneously since the server started. connections
/p-mysql/performance/com_delete The rate of delete statements. queries/second
/p-mysql/performance/com_delete_multi The rate of delete-multi statements. queries/second
/p-mysql/performance/com_insert The rate of insert statements. query/second
/p-mysql/performance/com_insert_select The rate of insert-select statements. queries/second
/p-mysql/performance/com_replace_select The rate of replace-select statements. queries/second
/p-mysql/performance/com_select The rate of select statements. queries/second
/p-mysql/performance/com_update The rate of update statements. queries/second
/p-mysql/performance/com_update_multi The rate of update-multi. queries/second
/p-mysql/performance/created_tmp_disk_tables The rate of internal on-disk temporary tables created by second by the server while executing statements. table/second
/p-mysql/performance/created_tmp_files The rate of temporary files created by second. files/second
/p-mysql/performance/created_tmp_tables The rate of internal temporary tables created by second by the server while executing statements. tables/second
/p-mysql/performance/kernel_time Percentage of CPU time spent in kernel space by MySQL. percent
/p-mysql/performance/key_cache_utilization The key cache utilization ratio. fraction
/p-mysql/performance/open_files The number of open files. files
/p-mysql/performance/open_tables The number of of tables that are open. tables
/p-mysql/performance/qcache_hits The rate of query cache hits. hits/second
/p-mysql/performance/questions The rate of statements executed by the server. queries/second
/p-mysql/performance/slow_queries The rate of slow queries. queries/second
/p-mysql/performance/table_locks_waited The total number of times that a request for a table lock could not be granted immediately and a wait was needed. number
/p-mysql/performance/threads_connected The number of currently open connections. connections
/p-mysql/performance/threads_running The number of threads that are not sleeping. threads
/p-mysql/performance/user_time Percentage of CPU time spent in user space by MySQL. percent
/p-mysql/performance/max_connections The maximum permitted number of simultaneous client connections. integer
/p-mysql/performance/open_files_limit The number of files that the operating system permits mysqld to open. integer
/p-mysql/performance/open_tables The number of tables that are open. integer
/p-mysql/performance/opened_tables The number of tables that have been opened. integer
/p-mysql/performance/opened_table_definitions The number of .frm files that have been cached. integer
Create a pull request or raise an issue on the source for this page in GitHub