Troubleshooting and FAQs for
On-Demand RabbitMQ for PCF
- How to Retrieve a Service Instance GUID
- Troubleshooting Errors
- Troubleshooting Components
-
Techniques for Troubleshooting
- Parse a Cloud Foundry (CF) Error Message
- Access Broker and Instance Logs and VMs
- Run Service Broker Errands to Manage Brokers and Instances
- Get Admin Credentials for a Service Instance
- Reinstall a Tile
- View Resource Saturation and Scaling
- Identify a Service Instance Owner
- Monitor the Quota Saturation and Service Instance Count
- Drop and Restore AMQP(S) Traffic to a RabbitMQ Instance
-
Frequently Asked Questions
- What should I check before deploying a new version of the tile?
- What is the correct way to stop and start RabbitMQ in PCF?
- What happens when I run bosh stop rabbitmq-server?
- What happens when bosh stop rabbitmq-server fails?
- What do I do when bosh stop rabbitmq-server fails?
- How can I manually back up the state of the RabbitMQ cluster?
- What pre-upgrade checks should I do?
- Knowledge Base (Community)
- File a Support Ticket
Warning: RabbitMQ for PCF v1.14 is no longer supported because it has reached the End of General Support (EOGS) phase as defined by the Support Lifecycle Policy. To stay up to date with the latest software and security updates, upgrade to a supported version.
This topic provides operators with basic troubleshooting techniques and FAQs for on-demand RabbitMQ for Pivotal Cloud Foundry (PCF).
How to Retrieve a Service Instance GUID
You need the GUID of your service instance to run some BOSH commands. To retrieve the GUID, run the command:
cf service SERVICE-INSTANCE-NAME --guid
If you do not know the name of the service instance, run cf services
to see a
listing of all service instances in the space. The service instances are listed
in the name column.
Troubleshooting Errors
Start here if you are responding to a specific error or error messages.
Failed Installation
- Certificate issues: The on-demand broker (ODB) requires valid certificates. Ensure that your certificates are valid and generate new ones if necessary. To generate new certificates, contact Pivotal Support.
- Deploy fails: Deploys can fail for a variety of reasons. View the logs using Ops Manager to determine why the deploy is failing.
- Networking problems:
- Cloud Foundry cannot reach the RabbitMQ for PCF broker
- Cloud Foundry cannot reach the service instances
- The service network cannot access the BOSH director
- Register broker errand fails.
- The smoke test errand fails.
- Resource sizing issues: These occur when the resource sizes selected for a given plan are less than RabbitMQ for PCF requires to function. Check your resource configuration in Ops Manager and ensure that the configuration matches that recommended by the service.
- Other service-specific issues.
Cannot Create or Delete Service Instances
If developers report errors such as:
Instance provisioning failed: There was a problem completing your request. Please contact your operations team providing the following information: service: redis-acceptance, service-instance-guid: ae9e232c-0bd5-4684-af27-1b08b0c70089, broker-request-id: 63da3a35-24aa-4183-aec6-db8294506bac, task-id: 442, operation: create
Follow these steps:
-
If the BOSH error shows a problem with the deployment manifest, open the manifest in a text editor to inspect it.
-
To continue troubleshooting, Log in to BOSH and target the RabbitMQ for PCF service instance using the instructions on parsing a Cloud Foundry error message.
-
Retrieve the BOSH task ID from the error message and run the following command:
bosh task TASK-ID
-
If you need more information, access the broker logs and use the
broker-request-id
from the error message above to search the logs for more information. Check for:
Broker Request Timeouts
If developers report errors such as:
Server error, status code: 504, error code: 10001, message: The request to the service broker timed out: https://BROKER-URL/v2/service_instances/e34046d3-2379-40d0-a318-d54fc7a5b13f/service_bindings/aa635a3b-ef6d-41c3-a23f-55752f3f651b
Follow these steps:
- Confirm that Cloud Foundry (CF) is connected to the service broker.
-
Check the BOSH queue size:
- Log into BOSH as an admin.
- Run
bosh tasks
.
- If there are a large number of queued tasks, the system may be under too much load. BOSH is configured with two workers and one status worker, which may not be sufficient resources for the level of load. Advise app developers to try again once the system is under less load.
Cannot Bind to or Unbind from Service Instances
Instance Does Not Exist
If developers report errors such as:
Server error, status code: 502, error code: 10001, message: Service broker error: instance does not exist`
Follow these steps:
-
Confirm that the RabbitMQ for PCF service instance exists in BOSH and obtain the GUID CF by running:
cf service MY-INSTANCE --guid
-
Using the GUID obtained above, the following BOSH CLI command:
bosh -d service-instance_GUID vms
If the BOSH deployment is not found, it has been deleted from BOSH. Contact Pivotal support for further assistance.
Other Errors
If developers report errors such as:
Server error, status code: 502, error code: 10001, message: Service broker error: There was a problem completing your request. Please contact your operations team providing the following information: service: example-service, service-instance-guid: 8d69de6c-88c6-4283-b8bc-1c46103714e2, broker-request-id: 15f4f87e-200a-4b1a-b76c-1c4b6597c2e1, operation: bind
To find out the exact issue with the binding process:
-
Search the logs for the
broker-request-id
string listed in the error message above. -
Contact Pivotal support for further assistance if you are unable to resolve the problem.
-
Check for:
Cannot Connect to a Service Instance
If developers report that their app cannot use service instances that they have successfully created and bound:
Ask the user to send application logs that show the connection error. If the error is originating from the service, then follow RabbitMQ for PCF-specific instructions. If the issue appears to be network-related, then:
-
Check that application security groups are configured correctly. Access should be configured for the service network that the tile is deployed to.
-
Ensure that the network the Pivotal Application Service (PAS) tile is deployed to has network access to the service network. You can find the network definition for this service network in the BOSH Director tile.
-
In Ops Manager go into the service tile and see the service network that is configured in the networks tab.
-
In Ops Manager go into the PAS tile and see the network it is assigned to. Make sure that these networks can access each other.
Upgrade All Service Instances Failures
If the upgrade-all-service-instances
errand fails,
look at the errand output in the Ops Manager log.
If an instance fails to upgrade, debug and fix it before running the errand again to prevent any failure issues from spreading to other on-demand instances.
Once the Ops Manager log no longer lists the deployment as failing
,
re-run the errand to upgrade the rest of the instances.
Missing Logs and Metrics
If no logs are being emitted by the on-demand broker, check that your syslog forwarding address is correct in Ops Manager.
-
Ensure you have configured syslog for the tile.
-
Ensure that you have network connectivity between the networks that the tile is using and the syslog destination. If the destination is external, you need to use the public ip VM extension feature available in your Ops Manager tile configuration settings.
-
Verify that the Firehose is emitting metrics:
-
Install the
cf nozzle
plugin. For instructions, see the firehose plugin GitHub repository. -
To find logs from your service in the
cf nozzle
output, run the following:cf nozzle -f ValueMetric | grep --line-buffered "on-demand-broker/MY-SERVICE"
-
If no metrics appear within five minutes, verify that the broker network has access to the Loggregator system on all required ports.
Contact Pivotal support if you are unable to resolve the issue.
Failed Deployment on Upgrade or after Apply Changes
If the deployment fails after editing the Assign AZs and Networks
pane of the RabbitMQ for PCF tile,
it might be due to a change to the IP addresses assigned to the RabbitMQ Server
job.
RabbitMQ for PCF requires that these IP addresses do not change once assigned.
If you change them, the deployment fails.
This includes changes made to your current installation or during an upgrade.
To diagnose and solve this issue, see Changing Network or IP Addresses Results in a Failed Deployment.
Troubleshooting Components
Guidance on checking for and fixing issues in on-demand service components.
BOSH Problems
Large BOSH Queue
On-demand service brokers add tasks to the BOSH request queue, which can back up
and cause delay under heavy loads.
An app developer who requests a new RabbitMQ for PCF service instance sees
create in progress
in the Cloud Foundry Command Line Interface (cf CLI) until
BOSH processes the queued request.
Ops Manager currently deploys two BOSH workers to process its queue. Future versions of Ops Manager will let users configure the number of BOSH workers.
Configuration
Service Instances in Failing State
You may have configured a VM / Disk type in tile plan page in Ops Manager that is insufficiently large for the RabbitMQ for PCF service instance to start. See tile-specific guidance on resource requirements.
Authentication
UAA Changes
If you have rotated any UAA user credentials then you may see authentication issues in the service broker logs.
To resolve this, redeploy the RabbitMQ for PCF tile in Ops Manager. This provides the broker with the latest configuration.
Note: You must ensure that any changes to UAA
credentials are reflected in the Ops Manager credentials
tab of the Pivotal Application Service (PAS) tile.
Networking
Common issues with networking include:
Issue | Solution |
---|---|
Latency when connecting to the RabbitMQ for PCF service instance to create or delete a binding. | Try again or improve network performance. |
Firewall rules are blocking connections from the RabbitMQ for PCF service broker to the service instance. | Open the RabbitMQ for PCF tile in Ops Manager and check the two networks configured in the Networks pane. Ensure that these networks allow access to each other. |
Firewall rules are blocking connections from the service network to the BOSH director network. | Ensure that service instances can access the Director so that the BOSH agents can report in. |
Apps cannot access the service network. | Configure Cloud Foundry application security groups to allow runtime access to the service network. |
Problems accessing BOSH’s UAA or the BOSH director. | Follow network troubleshooting and check that the BOSH director is online |
Validate Service Broker Connectivity to Service Instances
To validate connectivity, do the following:
-
To SSH into the RabbitMQ for PCF service broker, run the following command:
bosh -d service-instance_GUID ssh
-
If no BOSH
task-id
appears in the error message, look in the broker log using thebroker-request-id
from the task.
Validate App Access to Service Instance
Use cf ssh
to access to the app container, then try connecting to
the RabbitMQ for PCF service instance using the binding included in the
VCAP_SERVICES
environment variable.
Quotas
Plan Quota Issues
If developers report errors such as:
Message: Service broker error: The quota for this service plan has been exceeded. Please contact your Operator for help.
- Check your current plan quota.
- Increase the plan quota.
- Log into Ops Manager.
- Reconfigure the quota on the plan page.
- Deploy the tile.
- Find who is using the plan quota and take the appropriate action.
Global Quota Issues
If developers report errors such as:
Message: Service broker error: The quota for this service has been exceeded. Please contact your Operator for help.
- Check your current global quota.
- Increase the global quota.
- Log into Ops Manager.
- Reconfigure the quota on the on-demand settings page.
- Deploy the tile.
- Find out who is using the quota and take the appropriate action.
Failing Jobs and Unhealthy Instances
To determine whether there is an issue with the RabbitMQ for PCF service deployment, inspect the VMs. To do so, run the following command:
bosh -d service-instance_GUID vms --vitals
For additional information, run the following command:
bosh instances --ps --vitals
If the VM is failing, follow the service-specific information.
Any unadvised corrective actions (such as running BOSH restart
on
a VM) can cause issues in the service instance.
Techniques for Troubleshooting
This section contains instructions on interacting with the on-demand service broker and on-demand service instance BOSH deployments, and on performing general maintenance and housekeeping tasks.
Parse a Cloud Foundry (CF) Error Message
Failed operations (create, update, bind, unbind, delete) result in an error message.
You can retrieve the error message later by running the cf CLI command cf service INSTANCE-NAME
.
$ cf service myservice Service instance: myservice Service: super-db Bound apps: Tags: Plan: dedicated-vm Description: Dedicated Instance Documentation url: Dashboard: Last Operation Status: create failed Message: Instance provisioning failed: There was a problem completing your request. Please contact your operations team providing the following information: service: redis-acceptance, service-instance-guid: ae9e232c-0bd5-4684-af27-1b08b0c70089, broker-request-id: 63da3a35-24aa-4183-aec6-db8294506bac, task-id: 442, operation: create Started: 2017-03-13T10:16:55Z Updated: 2017-03-13T10:17:58Z
Use the information in the Message
field to debug further.
Provide this information to Pivotal Support when filing a ticket.
The task-id
field maps to the BOSH task ID.
For more information on a failed BOSH task, use the bosh task TASK-ID
.
The broker-request-guid
maps to the portion of the On-Demand Broker log
containing the failed step.
Access the broker log through your syslog aggregator, or access BOSH logs for
the broker by typing bosh logs broker 0
.
If you have more than one broker instance, repeat this process for each instance.
Access Broker and Instance Logs and VMs
Before following the procedures below, log into the cf CLI and the BOSH CLI.
Access Broker Logs and VMs
You can access logs using Ops Manager by clicking on the Logs tab in the tile and downloading the broker logs.
To access logs using the BOSH CLI, do the following:
-
Identify the on-demand broker (ODB) deployment by running the following command:
bosh deployments
-
View VMs in the deployment by running the following command:
bosh -d DEPLOYMENT-NAME instances
-
SSH onto the VM by running the following command:
bosh -d service-instance_GUID ssh
-
Download the broker logs by running the following command:
bosh -d service-instance_GUID logs
The archive generated by BOSH or Ops Manager includes the following logs:
Log Name | Description |
---|---|
broker.log | Requests to the on-demand broker and the actions the broker performs while orchestrating the request (e.g. generating a manifest and calling BOSH). Start here when troubleshooting. |
broker_ctl.log | Control script logs for starting and stopping the on-demand broker. |
post-start.stderr.log | Errors that occur during post-start verification. |
post-start.stdout.log | Post-start verification. |
drain.stderr.log | Errors that occur while running the drain script. |
Access Service Instance Logs and VMs
-
To target an individual service instance deployment, retrieve the GUID of your service instance with the following cf CLI command:
cf service MY-SERVICE --guid
-
To view VMs in the deployment, run the following command:
bosh -d DEPLOYMENT-NAME instances
-
To SSH into a VM, run the following command:
bosh -d service-instance_GUID ssh
-
To download the instance logs, run the following command:
bosh -d service-instance_GUID logs
Run Service Broker Errands to Manage Brokers and Instances
From the BOSH CLI, you can run service broker errands that manage the service brokers and perform mass operations on the service instances that the brokers created. These service broker errands include:
-
register-broker
registers a broker with the Cloud Controller and lists it in the Marketplace. -
deregister-broker
deregisters a broker with the Cloud Controller and removes it from the Marketplace. -
upgrade-all-service-instances
upgrades existing instances of a service to its latest installed version. -
delete-all-service-instances
deletes all instances of service. -
orphan-deployments
detects “orphan” instances that are running on BOSH but not registered with the Cloud Controller.
To run an errand, run the following command:
bosh -d DEPLOYMENT-NAME run-errand ERRAND-NAME
For example:
bosh -d my-deployment run-errand deregister-broker
Register Broker
The register-broker
errand registers the broker with Cloud Foundry and enables
access to plans in the service catalog.
Run this errand whenever the broker is re-deployed with new catalog metadata to
update the Cloud Foundry catalog.
Plans with disabled service access are not visible to non-admin Cloud Foundry users, including Org Managers and Space Managers. Admin Cloud Foundry users can see all plans including those with disabled service access.
The errand does the following:
- Registers the service broker with Cloud Controller.
-
Enables service access for any plans that have the radio button set to
enabled
in the tile plan page. - Disables service access for any plans that have the radio button set to
disabled
in the tile plan page. - Does nothing for any for any plans that have the radio button set to
manual
.
To run the errand, run the following command:
bosh -d DEPLOYMENT-NAME run-errand register-broker
Deregister Broker
This errand deregisters a broker from Cloud Foundry.
The errand does the following:
- Deletes the service broker from Cloud Controller
- Fails if there are any service instances, with or without bindings
Use the Delete All Service Instances errand to delete any existing service instances.
To run the errand, run the following command:
bosh -d DEPLOYMENT-NAME run-errand deregister-broker
Upgrade All Service Instances
If you have made changes to the plan definition or uploaded a new tile into Ops Manager, you might want to upgrade all the RabbitMQ for PCF service instances to the latest software or plan definition.
The upgrade-all-service-instances
errand does the following:
- Collects all of the service instances the on-demand broker has registered
- For each instance the errand does the following serially
- Issues an upgrade command to the on-demand broker
- Regenerates the service instance manifest based on its latest configuration from the tile
- Deploys the new manifest for the service instance
- Waits for this operation to complete, then proceeds to the next instance
- Adds to a retry list any instances that have ongoing BOSH tasks at the time of upgrade
- Retries any instances in the retry list until all are upgraded
If any instance fails to upgrade, the errand fails immediately. This prevents systemic problems from spreading to the rest of your service instances.
To run the errand, do one of the following:
- Select the errand through the Ops Manager UI and have it run when you click Apply Changes.
-
Run the following command.
bosh -d DEPLOYMENT-NAME run-errand upgrade-all-service-instances
Delete All Service Instances
This errand uses the Cloud Controller API to delete all instances of your broker’s service offering in every Cloud Foundry org and space. It only deletes instances the Cloud Controller knows about. It does not delete orphan BOSH deployments.
Note: Orphan BOSH deployments do not correspond to a known service instance.
While rare, orphan deployments can occur. Use the orphan-deployments
errand to identify them.
The delete-all-service-instances
errand does the following:
- Unbinds all apps from the service instances.
-
Deletes all service instances sequentially. Each service instance deletion includes:
- Running any pre-delete errands
- Deleting the BOSH deployment of the service instance
- Removing any ODB-managed secrets from BOSH CredHub
- Checking for instance deletion failure, which results in the errand failing immediately
- Determines whether any instances have been created while the errand was running. If new instances are detected, the errand returns an error. In this case, Pivotal recommends running the errand again.
Warning: Use extreme caution when running this errand. You should only use it when you want to totally destroy all of the on-demand service instances in an environment.
To run the errand, run the following command:
bosh -d service-instance_GUID delete-deployment
Detect Orphaned Service Instances
A service instance is defined as “orphaned” when the BOSH deployment for the instance is still running, but the service is no longer registered in Cloud Foundry.
The orphan-deployments
errand collates a list of service deployments that have
no matching service instances in Cloud Foundry and return the list to the operator.
It is then up to the operator to remove the orphaned BOSH deployments.
To run the errand, run the following command:
bosh -d DEPLOYMENT-NAME run-errand orphan-deployments
If orphan deployments exist—The errand script does the following:
- Exit with exit code 10
- Output a list of deployment names under a
[stdout]
header - Provide a detailed error message under a
[stderr]
header
For example:
[stdout] [{"deployment\_name":"service-instance\_80e3c5a7-80be-49f0-8512-44840f3c4d1b"}] [stderr] Orphan BOSH deployments detected with no corresponding service instance in Cloud Foundry. Before deleting any deployment it is recommended to verify the service instance no longer exists in Cloud Foundry and any data is safe to delete. Errand 'orphan-deployments' completed with error (exit code 10)
These details will also be available through the BOSH /tasks/
API endpoint for use in scripting:
$ curl 'https://bosh-user:bosh-password@bosh-url:25555/tasks/task-id/output?type=result' | jq .
{
"exit_code": 10,
"stdout": "[{"deployment_name":"service-instance_80e3c5a7-80be-49f0-8512-44840f3c4d1b"}]\n",
"stderr": "Orphan BOSH deployments detected with no corresponding service instance in Cloud Foundry. Before deleting any deployment it is recommended to verify the service instance no longer exists in Cloud Foundry and any data is safe to delete.\n",
"logs": {
"blobstore_id": "d830c4bf-8086-4bc2-8c1d-54d3a3c6d88d"
}
}
If no orphan deployments exist—The errand script does the following:
- Exit with exit code 0
- Stdout will be an empty list of deployments
- Stderr will be
None
[stdout] [] [stderr] None Errand 'orphan-deployments' completed successfully (exit code 0)
If the errand encounters an error during running—The errand script does the following:
- Exit with exit 1
- Stdout will be empty
- Any error messages will be under stderr
To clean up orphaned instances, run the following command on each instance:
WARNING: Running this command may leave IaaS resources in an unusable state.
bosh delete-deployment service-instance_SERVICE-INSTANCE-GUID
Get Admin Credentials for a Service Instance
- Identify the service deployment by GUID.
- Log in to BOSH.
- Open the manifest in a text editor.
- Look in the manifest for the credentials.
Reinstall a Tile
To reinstall a tile in the same environment where it was previously uninstalled:
- Ensure that the previous tile was correctly uninstalled as follows:
-
Log in as an admin by running:
cf login
-
Confirm that the Marketplace does not list
RabbitMQ for PCF by running:
cf m
-
Log in to BOSH as an admin by running:
bosh log-in
-
Display your BOSH deployments to confirm that the output does not
show the RabbitMQ for PCF deployment by running:
bosh deployments
- Run the “delete-all-service-instances” errand to delete every instance of the service.
- Run the “deregister-broker” errand to delete the service broker.
-
Delete the service broker BOSH deployment by running:
bosh delete-deployment BROKER-DEPLOYMENT-NAME
- Reinstall the tile.
-
Log in as an admin by running:
View Resource Saturation and Scaling
To view usage statistics for any service, do the following:
-
Run the following command:
bosh -d DEPLOYMENT-NAME vms --vitals
-
To view process-level information, run:
bosh -d DEPLOYMENT-NAME instances --ps
Identify a Service Instance Owner
If you want to identify which apps are using a specific service instance from the BOSH deployments name, do the following:
- Take the deployment name and strip the
service-instance_
leaving you with the GUID. - Log in to CF as an admin.
-
Obtain a list of all service bindings by running the following:
cf curl /v2/service_instances/GUID/service_bindings
-
The output from the above curl gives you a list of
resources
, with each item referencing a service binding, which contains theAPP-URL
. To find the name, org, and space for the app, run the following:cf curl APP-URL
and record the app name underentity.name
.cf curl SPACE-URL
to obtain the space, using theentity.space_url
from the above curl. Record the space name underentity.name
.-
cf curl ORGANIZATION-URL
to obtain the org, using theentity.organization_url
from the above curl. Record the organization name underentity.name
.
Note: When running cf curl
ensure that you query
all pages, because the responses are limited to a certain number of bindings per page.
The default is 50.
To find the next page curl the value under next_url
.
Monitor the Quota Saturation and Service Instance Count
Quota saturation and total number of service instances are available through ODB metrics emitted to Loggregator. The metric names are shown below:
Metric Name | Description |
---|---|
on-demand-broker/SERVICE-NAME-MARKETPLACE/quota_remaining |
global quota remaining for all instances across all plans |
on-demand-broker/SERVICE-NAME-MARKETPLACE/PLAN-NAME/quota_remaining |
quota remaining for a particular plan |
on-demand-broker/SERVICE-NAME-MARKETPLACE/total_instances |
total instances created across all plans |
on-demand-broker/SERVICE-NAME-MARKETPLACE/PLAN-NAME/total_instances |
total instances created for a given plan |
Note: Quota metrics are not emitted if no quota has been set.
Drop and Restore AMQP(S) Traffic to a RabbitMQ Instance
While debugging a RabbitMQ instance, you can prevent apps from
sending and receiving messages, for example, to decrease the server load.
You can use drop-amqp-traffic
and restore-amqp-traffic
scripts,
which run the necessary iptables
commands to achieve that.
To stop and then restore traffic to a RabbitMQ instance, do the following:
- To stop all AMQP(S) traffic to a RabbitMQ instance, enter the following command:
bosh -d service-instance_GUID ssh rabbitmq-server "echo y | sudo /var/vcap/packages/rabbitmq-admin/bin/drop-amqp-traffic"
- After performing the troubleshooting steps, restore the traffic.
To do this, enter the following command:
bosh -d service-instance_GUID ssh rabbitmq-server "echo y | sudo /var/vcap/packages/rabbitmq-admin/bin/restore-amqp-traffic"
Alternatively, you can run these scripts on individual nodes:
bosh ssh
to a rabbitmq-server instance.sudo -s
to gain root privileges.- Execute
drop-amqp-traffic
to drop all AMQP(S) traffic to this instance, orrestore-amqp-traffic
to start accepting traffic again.
Frequently Asked Questions
What should I check before deploying a new version of the tile?
Ensure that all nodes in the cluster are healthy from the RabbitMQ Management UI,
or health metrics exposed through Firehose.
You cannot rely solely on the BOSH instances
output as that reflects the state
of the Erlang VM used by RabbitMQ and not the RabbitMQ app.
What is the correct way to stop and start RabbitMQ in PCF?
Only BOSH commands should be used by the operator to interact with the RabbitMQ app.
For example:
bosh stop rabbitmq-server
and bosh start rabbitmq-server
.
There are BOSH job lifecycle hooks which are only fired when rabbitmq-server is
stopped through BOSH.
You can also stop individual instances by running the stop command and specifying
JOB [index]
.
Note: Do not use monit stop rabbitmq-server
as this does not call the drain scripts.
What happens when I run bosh stop rabbitmq-server?
BOSH starts the shutdown sequence from the bootstrap instance.
We start by telling the RabbitMQ app to shutdown and then shutdown the
Erlang VM within which it is running. If this succeeds, we run the following
checks to ensure that the RabbitMQ app and Erlang VM have stopped:
-
If
/var/vcap/sys/run/rabbitmq-server/pid
exists, check that the PID inside this file does not point to a running Erlang VM process. Notice that we are tracking the Erlang PID and not the RabbitMQ PID. - Check that
rabbitmqctl
does not return an Erlang VM PID.
What happens when bosh stop rabbitmq-server fails?
If the BOSH stop
fails, you will likely get an error saying that the drain
script failed with:
result: 1 of 1 drain scripts failed. Failed Jobs: rabbitmq-server.
What do I do when bosh stop rabbitmq-server fails?
The drain script logs to /var/vcap/sys/log/rabbitmq-server/drain.log
. If you
have a remote syslog configured, this appears as the rmq_server_drain
program.
First, BOSH ssh
into the failing rabbitmq-server instance and start the
rabbitmq-server job by running monit start rabbitmq-server
. You will not be
able to start the job with BOSH start
as this always runs the drain script first
and will fail as the drain script is failing.
Once rabbitmq-server job is running (confirm this with monit status
), run DEBUG=1
/var/vcap/jobs/rabbitmq-server/bin/drain
. This tells you exactly why it is
failing.
How can I manually back up the state of the RabbitMQ cluster?
It is possible to back up the state of a RabbitMQ cluster for both the on-demand and pre-provisioned services using the RabbitMQ Management API. Backups include virtual hosts, exchanges, queues, and users.
Back up Manually
- Log in to the RabbitMQ Management UI as the admin user you created.
- Select export definitions from the main page.
Back up and Restore with a Script
Use the API to run scripts with code similar to the following:
-
For the backup:
curl -u "$USERNAME:$PASSWORD" "http://$RABBIT-ADDRESS:15672/api/definitions" -o "$BACKUP-FOLDER/rabbit-backup.json"
-
For the restore:
curl -u "$USERNAME:$PASSWORD" "http://$RABBIT-ADDRESS:15672/api/definitions" -X POST -H "Content-Type: application/json" -d "@$BACKUP-FOLDER/rabbit-backup.json"
What pre-upgrade checks should I do?
Before doing any upgrade of RabbitMQ, Pivotal recommends checking the following:
- In Ops Manager check that the status of all of the instances is healthy.
- Log into the RabbitMQ Management UI and check that no alarms have been triggered and that all nodes display as green, showing they are healthy.
- Check that the system is not close to hitting either the memory or disk alarm. Do this by looking at what has been consumed by each node in the RabbitMQ Managment UI.
Knowledge Base (Community)
Find the answer to your question and browse product discussions and solutions by searching the Pivotal Knowledge Base.
File a Support Ticket
You can file a ticket with Pivotal Support.
Be sure to provide the error message from cf service YOUR-SERVICE-INSTANCE
.
To expedite troubleshooting, provide your service broker logs and your service instance logs.
If your cf service YOUR-SERVICE-INSTANCE
output includes a
task-id
, provide the BOSH task output.