Frequently Asked Questions for
Pre-Provisioned VMware Tanzu RabbitMQ [VMs]
Note: Pivotal Platform is now part of VMware Tanzu. In v1.19 and later, RabbitMQ for Pivotal Platform is named VMware Tanzu RabbitMQ [VMs].
This topic lists frequently asked questions that apply to the VMware Tanzu RabbitMQ [VMs] pre-provisioned service.
Ensure that all nodes in the cluster are healthy via the RabbitMQ Management UI, or health metrics exposed via the firehose.
You cannot rely solely
on the BOSH
instances output as that reflects the state of the Erlang VM used by RabbitMQ and not
the RabbitMQ application.
The operator should use only BOSH commands to interact with the RabbitMQ app.
There are BOSH job lifecycle hooks which are only fired when rabbitmq-server is
stopped through BOSH. You can also stop individual instances by running the
stop command with a
specific job index.
Extra care must be taken after stopping the full cluster given that nodes must be started in reverse
order of when they were shut down.
This means that after running
bosh stop rabbitmq-server, which stops rabbitmq-server/0 and then
rabbitmq-server/1 and then rabbitmq-server/2, you cannot run
bosh start rabbitmq-server.
This is because
bosh start rabbitmq-server tries to start rabbitmq-server/0 first (it always starts
with job index 0), but this node cannot start first because it was not the last node to stop.
To start the nodes after running
bosh stop rabbitmq-server, start each instance individually by
specifying each job index. For example:
bosh start rabbitmq-server/2 bosh start rabbitmq-server/1 bosh start rabbitmq-server/0
Alternatively, you can stop a full cluster by stopping each node individually, from highest index to
lowest, and then running
bosh start rabbitmq-server. See the example below:
bosh stop rabbitmq-server/2 bosh stop rabbitmq-server/1 bosh stop rabbitmq-server/0 bosh start rabbitmq-server
Note: Do not use
monit stop rabbitmq-server as this does not call the
BOSH starts the shutdown sequence from the bootstrap instance.
Tell the RabbitMQ app to shutdown and then shutdown the Erlang VM within which it is running. If this succeeds, run the following checks to ensure that the RabbitMQ app and Erlang VM have stopped:
/var/vcap/sys/run/rabbitmq-server/pidexists, check that the PID inside this file does not point to a running Erlang VM process. The Erlang PID is tracked and the RabbitMQ PID is not.
- Verify that
rabbitmqctldoes not return an Erlang VM PID.
After this completes on the bootstrap instance, BOSH continues the same sequence on the next instance. All remaining rabbitmq-server instances stop one by one.
If the BOSH
stop fails, you are likely to see the error below:
result: 1 of 1 drain scripts failed. Failed Jobs: rabbitmq-server.
The drain script logs to
/var/vcap/sys/log/rabbitmq-server/drain.log. If you
have a remote syslog configured, this appears as the
ssh into the failing rabbitmq-server instance and start the rabbitmq-server job by
monit start rabbitmq-server.
You are not able to start the job through
bosh start as this always runs the drain script first and
fails because the drain script is failing.
After rabbitmq-server job is running (confirm this through
monit status), run
/var/vcap/jobs/rabbitmq-server/bin/drain. This tells you exactly why it is failing.
You can back up the state of a RabbitMQ cluster for both the on-demand and pre-provisioned services using the RabbitMQ Management API. Backups include virtual hosts, exchanges, queues, and users.
Back up Manually
Log in to the RabbitMQ Management UI as the admin user you created.
Select export definitions from the main page.
Back up and Restore with a Script
Use the API to run scripts with code similar to the following:
For the backup:
curl -u "$USERNAME:$PASSWORD" "http://$RABBIT_ADDRESS:15672/api/definitions" -o "$BACKUP_FOLDER/rabbit-backup.json"
For the restore:
curl -u "$USERNAME:$PASSWORD" "http://$RABBIT_ADDRESS:15672/api/definitions" -X POST -H "Content-Type: application/json" -d "@$BACKUP_FOLDER/rabbit-backup.json"
Before doing any upgrade of RabbitMQ:
- In Ops Manager, check that the status of all of the instances is healthy.
- Log in to the RabbitMQ Management UI. For instructions about how to do so with or without OAuth enabled, see Using the RabbitMQ Management UI.
- Verify that no alarms were triggered and that all nodes are green.
- Check what each node has consumed to verify that the system is not close to hitting either the memory or disk alarm.
Find the answer to your question and browse product discussions and solutions by searching the VMware Tanzu Knowledge Base.
You can file a ticket with Support.
Be sure to provide the error message from
cf service YOUR-SERVICE-INSTANCE.
To expedite troubleshooting, provide your service broker logs and your service instance logs.
cf service YOUR-SERVICE-INSTANCE output includes a
task-id, provide the BOSH task output.