Troubleshooting Splunk Firehose Nozzle for PCF

This topic describes how to troubleshoot Splunk Firehose Nozzle for Pivotal Cloud Foundry (PCF).

Troubleshooting Walkthrough

1. I can’t find my data!

Are you searching for events and not finding them or looking at a dashboard and seeing “No result found”? Check Splunk Nozzle app logs.

To view the nozzle’s logs running on PCF do the following:

  1. Log in as an admin via the CLI.
  2. Target the org created by the tile.
    cf target -o SPLUNK-NOZZLE-ORG
  3. View the recent app Splunk Nozzle logs (the version number installed by the tile will vary).
    cf logs --recent splunk-firehoze-nozzle-1.0.0
  4. Alternatively, you can stream the app logs as they’re emitted.
    cf logs splunk-firehose-nozzle-1.0.0

Here are a few common errors and possible resolutions:

{"timestamp":"

This error usually occurs when SSL is enabled on the Splunk HEC endpoint. Confirm that you’re using https’ in the Splunk HEC URL.

{"timestamp":"

This usually means the index value specified in the configuration doesn’t exist on Splunk Host. Confirm that you’re using the correct Splunk index value.

{"timestamp":"

This can occur when the Splunk HEC Token value is invalid. Confirm that you’re using a valid token.

{"timestamp":"

This usually means that there was no valid SSL certificate found. Confirm that you’re using a valid SSL certificate for the Splunk server, or set ‘Skip SSL Validation’ to true under Splunk settings.

Note:Disabling SSL validation is not recommended for production environments.

{"timestamp":"

This error can occur when the Splunk server is offline or when the Splunk HEC URL is not valid. Confirm that both the Splunk server is running and that you’re using a valid URL.

{"timestamp":"

This error can occur when the credentials provided for CF environment are invalid. Confirm that the API User and API Password each have access to the CF environment.

{"timestamp":"

This means that no valid SSL certificate was found. To remediate this error, provide a valid SSL certificate for Cloud Foundry or set 'Skip SSL Validation’ to true under Cloud Foundry Settings.

Note:Disabling SSL validation is not recommended for production environments.

The following troubleshooting tips assume you have access to Splunk to run basic searches against index _internal and the user-specified index for Firehose events.

2. Ensure Splunk Nozzle is forwarding events from the Firehose:

Search app logs of the Nozzle to confirm correct behavior:

sourcetype="cf:splunknozzle"

A correct setup logs a start message with configuration parameters of the Nozzle logged as a JSON object, for example:

  data: {
     add-app-info: true
     api-endpoint: https://api.endpoint.com
     app-cache-ttl: 0
     app-limits: 0
     batch-size: 1000
     boltdb-path: cache.db
     branch: null
     buildos: null
     commit: null
     debug:  false
     extra-fields:
     flush-interval: 5000000000
     hec-workers: 8
     ignore-missing-apps: true
     job-host:
     job-index: -1
     job-name: splunk-nozzle
     keep-alive: 25000000000
     missing-app-cache-ttl:  0
     queue-size: 10000
     retries: 2
     skip-ssl: true
     splunk-host: http://localhost:8088
     splunk-index: atomic
     splunk-version: 6.6
     subscription-id: splunk-firehose
     trace-logging: true
     version:
     wanted-events: ValueMetric,CounterEvent,Error,LogMessage,HttpStartStop,ContainerMetric
  }
  ip: 10.0.0.0
  log_level: 1
  logger_source: splunk-nozzle-logger
  message: splunk-nozzle-logger.Running splunk-firehose-nozzle with following configuration variables
  origin: splunk_nozzle

Search app logs of the Nozzle for any errors:

sourcetype="cf:splunknozzle" data.error=*

Errors are logged with corresponding message and stacktrace.

3. Check for dropped events due to HTTP Event Collector availability:

As the Splunk Firehose Nozzle sends data to Splunk via HTTPS using the HTTP Event Collector, it is also susceptible to any network issues across the network path from point to point. Run the following search to determine if Splunk has indexed any events indicating issues with the HEC Endpoint.

  sourcetype="cf:splunknozzle" "dropping events"

4. Check for data loss inside the Splunk Firehose Nozzle:

If “Event Tracing” is enabled, extra metadata will be attached to events. This allows searches to calculate the percentage of data loss inside the Splunk Firehose Nozzle, if applicable.

Each instance of the Splunk Firehose Nozzle will run with a randomly generated UUID. The query below will display the message success rate for each UUID.

index=main | stats count as total_events , max(nozzle-event-counter) as max_number by uuid | eval success_percentage=(total_events/max_number) * 100 | table success_percentage
Create a pull request or raise an issue on the source for this page in GitHub