RabbitMQ for PCF v1.9.16

RabbitMQ for PCF Operations FAQ's

This topic asks and answers some frequently asked questions (FAQs) about RabbitMQ for PCF.

About the BOSH CLI

The BOSH CLI is available in two major versions, v1 and v2. Pivotal recommends that you use the BOSH CLI v2 when possible.

This topic provides examples of using each version of the BOSH CLI. While all versions of the BOSH CLI work with RabbitMQ for PCF v1.9.x, your Ops Manager version may affect which version of the BOSH CLI you can use. Consult the table below to determine which version of the CLI is supported for your installation.

Ops Manager Version BOSH CLI Version
1.10 CLI v1
1.11 CLI v1 or CLI v2 (Pivotal recommends CLI v2)
1.12 and later CLI v2

What should I check before deploying a new version of the tile?

Ensure that all nodes in the cluster are healthy via the RabbitMQ Management UI, or health metrics exposed via the firehose.

Do not rely on BOSH instances, to execute this task correctly. That output reflects the state of the Erlang VM used by RabbitMQ, not the RabbitMQ application.

What is the correct way to stop and start RabbitMQ in PCF?

Only BOSH commands should be used by the operator to interact with the RabbitMQ application. Examples for both versions of the CLI are given below.

  • For Ops Manager v1.10 or earlier:
    bosh stop rabbitmq-server bosh start rabbitmq-server
  • For Ops Manager v1.11 or later:
    bosh2 stop rabbitmq-server bosh2 start rabbitmq-server

There are BOSH job lifecycle hooks which are only fired when rabbitmq-server is stopped through BOSH. You can also stop individual instances by running one of the following commands:

  • For Ops Manager v1.10 or earlier: bosh stop JOB [index]
  • For Ops Manager v1.11 or later: bosh2 stop JOB [index]

Note: Do not use monit stop rabbitmq-server. This command does not call the drain scripts.

What happens when I stop the RabbitMQ server?

You can stop the RabbitMQ server with BOSH stop.

BOSH starts the shutdown sequence from the bootstrap instance.

This command tells the RabbitMQ application to shut down and then shut down the Erlang VM in which it is running. If this succeeds, run the following checks to ensure that the RabbitMQ application and Erlang VM have stopped:

  1. If /var/vcap/sys/run/rabbitmq-server/pid exists, check that the PID inside this file does not point to a running Erlang VM process. This process checks the Erlang PID and not the RabbitMQ PID.
  2. Check that rabbitmqctl does not return an Erlang VM PID.

Once this completes on the bootstrap instance, BOSH will continue the same sequence on the next instance. All remaining RabbitMQ servers stop sequentially after that.

What happens when the RabbitMQ server fails to stop?

If BOSH stop fails, you will likely get an error saying that the drain script failed with:

result: 1 of 1 drain scripts failed. Failed Jobs: rabbitmq-server.

What do I do when the RabbitMQ server fails to stop?

The drain script logs to /var/vcap/sys/log/rabbitmq-server/drain.log. If you have a remote syslog configured, this will appear as the rmq_server_drain program.

  1. BOSH ssh into the failing rabbitmq-server instance and start the rabbitmq-server job by running monit start rabbitmq-server).
  2. When the rabbitmq-server job is running (confirm this via monit status), run DEBUG=1 /var/vcap/jobs/rabbitmq-server/bin/drain. This will tell you exactly why it’s failing.

Note: You will not be able to start the job with BOSH start as this command always runs the drain script first. It will fail since the drain script is failing.

How can I manually back up the state of the RabbitMQ cluster?

You can back up the state of a RabbitMQ cluster for both the on-demand and pre-provisioned services using the RabbitMQ Management API. Backups include vhosts, exchanges, queues and users.

Back up Manually

  1. Log in to the RabbitMQ Management UI as the admin user you created.

  2. Select export definitions from the main page.

Back up and Restore with a Script

Use the API to run scripts with code similar to the following:

  1. For the backup:

    curl -u "$USERNAME:$PASSWORD" "http://$RABBIT_ADDRESS:15672/api/definitions"
    -o "$BACKUP_FOLDER/rabbit-backup.json"
  2. For the restore:

    curl -u "$USERNAME:$PASSWORD" "http://$RABBIT_ADDRESS:15672/api/definitions"
    -X POST -H "Content-Type: application/json" -d

    What pre-upgrade checks should I do?

Before doing any upgrade of RabbitMQ, Pivotal recommends checking the following:

  1. In Operations Manager check that the status of all of the instances is healthy.
  2. Log into the RabbitMQ Management UI and check that no alarms have been triggered and that all nodes are healthy, that is, they should display as green.
  3. Check that the system is not close to hitting either the memory or disk alarm. Do this by looking at what has been consumed by each node in the RabbitMQ Managment UI.
Create a pull request or raise an issue on the source for this page in GitHub