Solace PubSub+ Troubleshooting

This topic describes how operators and developers can troubleshoot instances of Solace PubSub+ services.

A Solace PubSub+ Service Instance is backed by a Message VPN on one or many Solace PubSub+ Software Message Broker, depending on the chosen plan.

You can discover the Solace PubSub+ Software Message Broker backing a service instance from the Solace PubSub+ Credentials which become available by binding an app to the instance or creating a service key for the instance.

Operator Troubleshooting

The operator will have access to the logs of all the components of Solace PubSub+ for PCF:

  • Service Broker Logs
  • Service Broker Agent Logs
  • Solace PubSub+ message broker Logs

An installation of Solace PubSub+ for PCF may be configured to use System Logging as a means of gathering all logs.

Accessing Logs

The following locations in Solace PubSub+ BOSH VMs will hold logs of interest that can help in troubleshooting issues:

  • All VM job logs
    • /var/vcap/sys/log
  • All Solace PubSub+ message broker Logs
    • /var/vcap/store/containers/pubsub/volumes/jail/logs

To access service broker logs, do the following:

  1. Set your API endpoint to the Cloud Controller of your deployment.

    $ cf api api.YOUR-SYSTEM-DOMAIN
    Setting api endpoint to api.YOUR-SYSTEM-DOMAIN...
    OK
    API endpoint:  https://api.YOUR-SYSTEM-DOMAIN (API version: 2.82.0)
    Not logged in. Use 'cf login' to log in.
    

  2. Log in to your deployment and select an org and a space.

    $ cf login
    API endpoint: https://api.YOUR-SYSTEM-DOMAIN
    Email> user@example.com
    Password>
    

  3. Target the solace org and solace-broker space.

    $ cf target -o solace -s solace-broker
    api endpoint:   https://api.YOUR-SYSTEM-DOMAIN
    api version:    2.82.0
    user:           admin
    org:            solace
    space:          solace-broker
    

  4. Discover the Solace PubSub+ Service Broker App name.

    $ cf apps
    Getting apps in org solace /space solace-broker as user@example.com...
    OK

    name requested state instances memory disk urls solace-broker-1.2.0 started 1/1 1G 512M solace-broker.YOUR-SYSTEM-DOMAIN

    OK

    Note: Take note of the application name. In this case ‘solace-broker-1.2.0’

  5. To watch the logs of Solace PubSub+ Service Broker

    $ cf logs solace-broker-1.2.0 | tee saved_solace-broker-1.2.0.txt
    Retrieving logs for app solace-broker-1.2.0 in org solace /space solace-broker as user@example.com...
    ....
    

Additional Diagnostics

In addition to regular logging, Solace PubSub+ message brokers tools can help gather diagnostics logs on demand.

For example, having identified a problem with a given Solace service instance, the operator can access the backing Solace PubSub+ message brokers to examine logs and gather diagnostics.

The backing Solace PubSub+ message brokers for a given service instance can be identified from the IPs used in the bindings or service keys, or mentioned in the service broker logs or message when an exception occurs.

The operator should look for the VM in the Solace PubSub+ bosh deployment with the matching IP.

  1. Get additional diagnostics from “EnterpriseLarge/0” VM.
    $ bosh ssh EnterpriseLarge/0
    # sudo -i
    # /var/vcap/jobs/broker_agent/bin/gather_diagnostics.sh
    
    Then you can look at the gathered logs under /var/vcap/store/diag.

Upgrade Issues

The following issues may arise while upgrading a tile:

Error
Action Failed get_task: result: 1 of 1 drain scripts failed. Failed Jobs: containers.
ExplanationA pre-condition was not met. A high availability (HA) group was not in a healthy state at the start of the upgrade. This check is done at the shutdown of each VM in an HA group.
Possible ActionEnsure an HA Group is healthy. The user can log into the failing VM and look at the log file located at /var/vcap/sys/log/containers/drain.stdout.log to determine the underlying cause. Once an HA Group is healthy, an upgrade can be retried.


Error
Action Failed get_task: result: 1 of 1 post-start scripts failed. Failed Jobs: containers.
Explanation The upgrade was stopped due to a detected failure to ensure services remain available.
Possible ActionThe user should log into the failing VM and look at the log file located at /var/vcap/sys/log/containers/post-start.stdout.log to determine the underlying cause. Once the issue is fixed, an upgrade can be retried.

Developer Troubleshooting

A developer using Solace PubSub+ for PCF may encounter errors when using Cloud Foundry Command-Line Interface (cf CLI) to perform basic operations on a Solace PubSub+ for PCF service instance.

In general, most errors are about these types:

  • Reaching limits (plan limits, inventory fully used)
  • Communication problems between the service broker and the VM inventory it manages
  • Unexpected health state, such as a degraded high availability setup

While this list it not complete, it provides representative samples with explanations and possible resolutions. Some of the resolutions will require operator intervention.

Deployment limits reached for a given plan
Operation
cf create-service solace-pubsub enterprise-large
Error
Server error, status code: 502, error code: 10001, message: Service broker error: com.solace.cloudfoundry.servicebroker.exception.SolaceServiceException: No matching Solace Message VPNs available.
ExplanationThe service broker does not find any Solace Message VPNs in its inventory for the requested service plan.
Possible Action(s)The operator needs to increase the number of allocated Solace PubSub+ message brokers that support the given plan, in this case a enterprise-large.


Invalid Parameter
Operation
cf update-service my-large-instance -c '{"some_option": "some_value" }'
Error
Server error, status code: 502, error code: 10001, message: Service broker error: Unrecognized parameter key some_option
ExplanationThe service broker does not recognize the parameter name.
Possible Action(s)Use the correct parameters. Please see Service Specific Parameters.


Invalid Parameter: The required feature is not enabled (TCP Routes)
Operation
cf update-service my-large-instance -c '{ "mqtt_tcp_route_enabled" : "false" }'
Error
Server error, status code: 502, error code: 10001, message: Service broker error: The parameter mqtt_tcp_route_enabled is invalid given the current configuration. It requires [ TCP Routes Enabled ]
ExplanationAs indicated in the error, the given parameter is only valid when TCP Routes is enabled.
Possible Action(s)See Configuring TCP Routes.


Invalid Parameter: The required feature is not enabled (LDAP)
Operation
cf update-service my-large-instance -c '{ "ldapGroupAdminReadOnly" : "cn=username1,ou=groups,dc=solace,dc=com" }'
Error
Server error, status code: 502, error code: 10001, message: Service broker error: The parameter ldapGroupAdminReadOnly is invalid given the current configuration. It requires [ LDAP Enabled, Management Access set to LDAP ]
ExplanationAs indicated in the error, the given parameter is only valid when LDAP is enabled and Management Access is set to LDAP.
Possible Action(s)See Configuring LDAP and Management Access to use LDAP.


Communication failure: The Solace PubSub+ message brokers is not reachable.
Operation
cf delete-service -f my-large-instance
Error
cf service my-large-instance

Service instance: my-large-instance
Service: solace-pubsub
Bound apps:
Tags:
Plan: enterprise-large
Description: Solace PubSub+ message broker for real-time, multi-protocol data distribution
Documentation url: http://docs.solace.com
Dashboard: https://enterprise-large-0.YOUR-SYSTEM.DOMAIN/#/msg-vpns/djAwNQ==?token=YWJj.eyJhY2Nlc3NfdG9rZW4iOiJ2MDA1LWlaksjdlasdjas09dasdlkansdlakslZmRmOWFlNjM3ZGYwMDcifQ%3D%3D.eHl6

Last Operation
Status: delete failed
Message: com.solace.cloudfoundry.servicebroker.exception.SolaceServiceException: Unable to delete Service, the associated Message VPN v001 on 10.244.0.3 is not currently available
Started: 2020-01-00T00:00:00Z
Updated: 2020-01-00T00:00:00Z
ExplanationThe service broker cannot delete this instance because the Message VPN is flagged as unavailable. This happens when the backing Solace PubSub+ message broker is not reachable.
Possible Action(s)The operator should examine the solace deployment and see why VM 10.244.0.3 is not available.

The service delete operation can be reattempted once the VM health is restored.


HA service degradation.
Operation
cf bind-service my-app my-ha-instance
Error
MessageRouterException: Primary VMR 10.233.0.3 HA Group Status is degraded v001 for messageVpn v001
ExplanationThe service broker will always reject an operation for an HA service when the HA status is degraded. A variety of reasons may be given in the message. Other operations on the same Message Router will fail as well.
Possible Action(s)The operator should examine the Solace deployment and see why VM 10.244.0.3 is not available.

CF operation can be reattempted once the VM health is restored and the HA status is not degraded.


Orphaned Resource Policy
Operation
cf unbind-service my-app my-service-instance
Error
Server error, status code: 502, error code: 10001, message: Service instance test-shared: Service broker error: Operation canceled. Orphaned Endpoints Policy was violated.
Endpoints owned by v002.cu000001 must first be deleted:
        Queues: [ someQ ]
        Topics Endpoints: [ someTPE ]
ExplanationThe service broker will reject unbinding when the client-username that was used by the application owns Endpoints such as Queues and Topic Endpoints while the current Orphaned Resource Policy is to Abort
Possible Action(s)Delete the Endpoints before unbinding. Alternatively you can adjust the Service Orphaned Resource Policy.


Create a pull request or raise an issue on the source for this page in GitHub