Solace Messaging Troubleshooting

This topic describes how operators and developers can troubleshoot instances of Solace Messaging services.

A Solace Messaging Service Instance is backed by a Message VPN on one or many Solace Virtual Message Router (VMR), depending on the chosen plan.

You can discover the VMR(s) backing a service instance from the Solace Messaging Credentials which become available by binding an app to the instance or creating a service key for the instance.

Operator Troubleshooting

The operator will have access to the logs of all the components of Solace Messaging for PCF:

  • Service Broker Logs
  • VMR Logs
  • VMR Agent Logs

An installation of Solace Messaging for PCF may be configured to use System Logging as a means of gathering all logs.

Accessing Logs

The following locations in Solace Messaging BOSH VMs will hold logs of interest that can help in troubleshooting issues:

  • All VM job logs
    • /var/vcap/sys/log
  • All VMR Logs
    • /var/vcap/store/prepare_vmr/volumes/jail/logs

To access service broker logs, do the following:

  1. Set your API endpoint to the Cloud Controller of your deployment.
    $ cf api api.YOUR-SYSTEM-DOMAIN
    Setting api endpoint to api.YOUR-SYSTEM-DOMAIN...
    OK
    API endpoint:  https://api.YOUR-SYSTEM-DOMAIN (API version: 2.82.0)
    Not logged in. Use 'cf login' to log in.
    
  2. Log in to your deployment and select an org and a space.
    $ cf login
    API endpoint: https://api.YOUR-SYSTEM-DOMAIN
    Email> user@example.com
    Password>
    
  3. Target the solace org and solace-messaging space.
    $ cf target -o solace -s solace-messaging
    api endpoint:   https://api.local.pcfdev.io
    api version:    2.82.0
    user:           admin
    org:            solace
    space:          solace-messaging
    
  4. Discover the Solace Messaging Service Broker App name.

    $ cf apps
    Gettings apps in org solace /space solace-messaging as user@example.com...
    OK

    name requested state instances memory disk urls solace-messaging-1.2.0 started 1/1 1G 512M solace-messaging.YOUR-SYSTEM-DOMAIN

    OK

    Note: Take note of the application name. In this case ‘solace-messaging-1.2.0’

  5. To watch the logs of Solace Messaging Service Broker

    $ cf logs solace-messaging-1.2.0 | tee saved_solace-messaging-1.2.0.txt
    Retrieving logs for app solace-messaging-1.2.0 in org solace /space solace-messaging as user@example.com...
    ....
    

Additional Diagnostics

In addition to regular logging, VMR tools can help gather diagnotics logs on demand.

For example, having identified a problem with a given Solace service instance, the operator can access the backing VMR to examine logs and gather diagnositics.

The backing VMR for a given service instance can be identified from the IPs used in the bindings or service keys, or mentioned in the service broker logs or message when an exception occurs.

The operator should look for the VM in the Solace Messaging bosh deployment with the matching IP.

  1. Getting additional diagnostics from “Large-VMR/0” VM.
    $ bosh ssh Large-VMR/0
    # sudo -i
    # /var/vcap/jobs/vmr_agent/bin/gather_diagnostics.sh
    
    Then you can look at the gathered logs under /var/vcap/store/prepare_vmr/volumes/jail/logs

Upgrade Issues

The following issues may arise while upgrading a tile:

Error
Action Failed get_task: result: 1 of 1 drain scripts failed. Failed Jobs: containers.
ExplanationA pre-condition was not met. A high availability (HA) group was not in a healthy state at the start of the upgrade. This check is done at the shutdown of each VM in an HA group.
Possible ActionEnsure an HA Group is healthy. The user can log into the failing VM and look at the log file located at /var/vcap/sys/log/containers/drain.stdout.log to determine the underlying cause. Once an HA Group is healthy, an upgrade can be retried.


Error
Action Failed get_task: result: 1 of 1 post-start scripts failed. Failed Jobs: containers.
Explanation The upgrade was stopped due to a detected failure to ensure services remain available.
Possible ActionThe user should log into the failing VM and look at the log file located at /var/vcap/sys/log/containers/post-start.stdout.log to determine the underlying cause. Once the issue is fixed, an upgrade can be retried.

Developer Troubleshooting

A developer using Solace Messaging for PCF may encounter errors when using Cloud Foundry Command-Line Interface (cf CLI) to perform basic operations on a Solace Messaging for PCF service instance.

In general, most errors are about these types:

  • Reaching limits (plan limits, inventory fully used)
  • Communication problems between the service broker and the VM inventory it manages
  • Unexpected health state, such as a degraded high availability setup

While this list it not complete, it provides representative samples with explanations and possible resolutions. Some of the resolutions will require operator intervention.

Deployment limits reached for a given plan
Operation
cf create-service solace-messaging large
Error
Server error, status code: 502, error code: 10001, message: Service broker error: com.solace.cloudfoundry.servicebroker.exception.SolaceServiceException: No matching Solace Message VPNs available.
ExplanationThe service broker does not find any Solace Message VPNs in its inventory for the requested service plan.
Possible Action(s)The operator needs to increase the number of allocated VMRs that support the given plan, in this case a Large-VMR.


Invalid Parameter
Operation
cf update-service my-large-instance -c '{"some_option": "some_value" }'
Error
Server error, status code: 502, error code: 10001, message: Service broker error: Unrecognized parameter key some_option
ExplanationThe service broker does not recognize the parameter name.
Possible Action(s)Use the correct parameters. Please see Service Specific Parameters.


Invalid Parameter: The required feature is not enabled (TCP Routes)
Operation
cf update-service my-large-instance -c '{ "mqtt_tcp_route_enabled" : "false" }'
Error
Server error, status code: 502, error code: 10001, message: Service broker error: The parameter mqtt_tcp_route_enabled is invalid given the current configuration. It requires [ TCP Routes Enabled ]
ExplanationAs indicated in the error, the given parameter is only valid when TCP Routes is enabled.
Possible Action(s)See Configuring TCP Routes.


Invalid Parameter: The required feature is not enabled (LDAP)
Operation
cf update-service my-large-instance -c '{ "ldapGroupAdminReadOnly" : "cn=username1,ou=groups,dc=solace,dc=com" }'
Error
Server error, status code: 502, error code: 10001, message: Service broker error: The parameter ldapGroupAdminReadOnly is invalid given the current configuration. It requires [ LDAP Enabled, Management Access set to LDAP ]
ExplanationAs indicated in the error, the given parameter is only valid when LDAP is enabled and Management Access is set to LDAP.
Possible Action(s)See Configuring LDAP and Management Access to use LDAP.


Communication failure: The VMR is not reachable.
Operation
cf delete-service -f my-large-instance
Error
cf service my-large-instance

Service instance: my-large-instance
Service: solace-messaging
Bound apps:
Tags:
Plan: large
Description: Solace Messaging Enterprise Edition for real-time, multi-protocol data distribution
Documentation url: http://docs.solace.com
Dashboard: https://large-vmr-0.YOUR-SYSTEM.DOMAIN/#/msg-vpns/djAwNQ==?token=YWJj.eyJhY2Nlc3NfdG9rZW4iOiJ2MDA1LWlaksjdlasdjas09dasdlkansdlakslZmRmOWFlNjM3ZGYwMDcifQ%3D%3D.eHl6

Last Operation
Status: delete failed
Message: com.solace.cloudfoundry.servicebroker.exception.SolaceServiceException: Unable to delete Service, the associated Message VPN v001 on 10.244.0.3 is not currently available
Started: 2020-01-00T00:00:00Z
Updated: 2020-01-00T00:00:00Z
ExplanationThe service broker cannot delete this instance because the Message VPN is flagged as unavailable. This happens when the backing VMR is not reachable.
Possible Action(s)The operator should examine the solace deployment and see why VM 10.244.0.3 is not available.

The service delete operation can be reattempted once the VM health is restored.


HA service degradation.
Operation
cf bind-service my-app my-ha-instance
Error
MessageRouterException: Primary VMR 10.233.0.3 HA Group Status is degraded v001 for messageVpn v001
ExplanationThe service broker will always reject an operation for an HA service when the HA status is degraded. A variety of reasons may be given in the message. Other operations on the same Message Router will fail as well.
Possible Action(s)The operator should examine the Solace deployment and see why VM 10.244.0.3 is not available.

CF operation can be reattempted once the VM health is restored and the HA status is not degraded.


Orphaned Resource Policy
Operation
cf unbind-service my-app my-service-instance
Error
Server error, status code: 502, error code: 10001, message: Service instance test-shared: Service broker error: Operation canceled. Orphaned Endpoints Policy was violated.
Endpoints owned by v002.cu000001 must first be deleted:
        Queues: [ someQ ]
        Topics Endpoints: [ someTPE ]
ExplanationThe service broker will reject unbinding when the client-username that was used by the application owns Endpoints such as Queues and Topic Endpoints.
Possible Action(s)Delete the Endpoints before unbinding. Alternatively, as the management user change the ownership of the Endpoints.


Create a pull request or raise an issue on the source for this page in GitHub