Frequently Asked Questions
I see lots of replication errors in my logs! Is the cluster broken?
Unless the GRA files show a clear execution error (e.g., out of disk space) this is a normal behavior, and it’s nothing to worry about. We will be working on more advanced monitoring to detect the failure case, and alert Operators in the future.
Occasionally, you’ll see replication errors in the MySQL logs that will look something like this:
160318 9:25:16 [Warning] WSREP: RBR event 1 Query apply warning: 1, 16992456
160318 9:25:16 [Warning] WSREP: Ignoring error for TO isolated action: source: abcd1234-abcd-1234-abcd-1234abcd1234 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 246804 trx_id: -1 seqnos (l: 865022, g: 16992456, s: 16992455, d: 16992455, ts: 2530660989030983)
160318 9:25:16 [ERROR] Slave SQL: Error 'Duplicate column name 'number'' on query. Default database: 'cf_0123456_1234_abcd_1234_abcd1234abcd'. Query: 'ALTER TABLE ...'
What this is saying is that someone (probably an app) issued an “ALTER TABLE” command that failed to apply to the current schema. More often than not, this is user error.
The node that receives the request processes it as any MySQL server will, if it fails, it just spits that failure back to the app, and the app needs to decide what to do next. That part is normal. HOWEVER, in a Galera cluster, all DDL is replicated, and all replication failures are logged. So in this case, the bad ALTER TABLE command will be run by both slave nodes, and if it fails, those slave nodes will log it as a “replication failure” since they can’t tell the difference.
It’s really hard to get a valid DDL to work on some nodes, yet fail on others. Usually those cases are limited to out of disk space or working memory. We haven’t duplicated that yet.
But I found a blog article that suggests that the schemata can get out of sync?
The key thing about this post is that he had to deliberately switch a node to RSU, which MySQL for Pivotal Cloud Foundry (PCF) never does except during SST. So this is a demonstration of what is possible, but does not explain how a customer may actually experience this in production.
What does the error,
blocked because of many connection errors mean?
There are times when MySQL will blacklist its own proxies:
OUT 07:44:02.070 [paasEnv=MYPASS orgName=MYORG spaceName=MYSPACE appName=dc-routing appId=0123456789] [http-nio-8080-exec-5] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - Host '192.0.2.15' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'
You can solve this by running the following on any of the MySQL job VMS:
This is an artifact of an automatic polling-protection feature built into MySQL and MariaDB. It is a historical feature intended to block Denial of Service attacks. It is usually triggered by a Load Balancer or System Monitoring software performing empty “port checks” against the MySQL proxies. This is why it is important to configure any Load Balancer to perform TCP checks against the proxy health-check port, default 1936. Repeated port checks against 3306 will cause an outage for all MySQL for PCF users.
- Note: This issue has been disabled as of MySQL for PCF v1.8.0-edge.4.